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Abstract 

"^ . We study the empirical process supj Gf |iV _1 Yli=i f 2 (Xi) — E/ 2 |, 

M-H ' where F is a class of mean-zero functions on a probability space (f2, fi) 

^ r*| and (X,-)^ are selected independently according to //. 

We present a sharp bound on this supremum that depends on the 
■01 diameter of the class F (rather than on the ip2 one) and on the com- 
plexity parameter 72 (F,^)- In addition, we present optimal bounds 
on the random diameters supf GF max|/| =m (^ ie/ ^(Xi)) 1 ' 2 using the 
same parameters. As applications, we extend several well known re- 
^ , suits in Asymptotic Geometric Analysis to any isotropic, log-concave 

ensemble on M". 



OO 

• : 1 Introduction 

O, 

(^> In this article we study the empirical process 
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where F is a class of functions on the probability space (Q, /j,) and {Xi)f =1 
are independent, distributed according to \i. Properties of this process play 
an important part in Asymptotic Geometric Analysis and in Nonparametric 
Statistics, though even without considering the possible applications, (jl.ip 
is a natural object. Indeed, a fundamental problem in Empirical Processes 
Theory is to understand the way the empirical (random) structure of a class 
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of functions, obtained by random sampling, captures the original structure 
determined by the underlying measure \i. More accurately, one wishes to 
relate, with high probability, iV -1 Y2i=i ^(f(-^i)) t° ^(/)> uniformly in / G 
F, for a reasonable real valued function £. The two most natural functions 
that are considered in this context are £(t) = t, which leads to the Uniform 
Law of Large Numbers, and lit) = t 2 , which is connected to properties of 
the Uniform Central Limit Theorem and gives information on the way the 
empirical £2 structure of F is connected to the L2 (//) one (see [131 EZ] for an 
extensive study of these topics) . 

Despite its importance, bounds on (jl.ip are not satisfactory. Standard 
empirical processes methods allow one to bound (ll.lh only in rather trivial 
cases, in which either the class F is bounded in L^, or if it has a well 
behaved envelope function (recall that an envelope function is W(uj) = 
supf €F \f(u})\). In those cases it is possible to use contraction methods 

and control (|l.ip using the linear process supj G ^ \N~ l Y^i=i f(-^-i) ~ ^/l> 
which is a far simpler object than (jl.ip . However, if the function class is not 
uniformly bounded, or even if it is, but with a very weak uniform bound, 
contraction based methods lead to trivial estimates on (jl.ip . 

An alternative approach is to control (jl.ip using random parameters of 
F that depend on the geometry of typical coordinate projections 

P a F = {(f(X l ),...J(X N )) : feF}, 

for an independent sample a = (X\, ...,Xn). The downside of this approach 
is that the structure of P a F itself is often difficult to handle, let alone that 
of P a F 2 for F 2 = {f 2 : f G F}. Moreover, the standard way of relating 
the geometry of P a F 2 to that of P a F also involves contraction methods, 
resulting in the same type of problems that have been mentioned above. 

To illustrate the difficulty, consider the following, seemingly simple prob- 
lem. Let ft = W n and assume that /i is a natural measure on M n , say the 
canonical gaussian measure, the uniform measure on { — 1, l} n , or more gen- 
erally, an isotopic log-concave measure (see the definitions in Section [2]) . Let 
F be the class of linear functionals on W 1 of Euclidean norm one, that is, 
F = {(x,-) -.xeS™- 1 }. 

Note that F may consist of unbounded functions on (Q,/j,), or, at best, 
of functions with an L^ bound that grows polynomially with the dimension 
n. It is straightforward to show that contraction based methods lead to 
a very loose estimate on (jl.ip in such a case. To make things worse, if 
one considers a typical sample (X^fL-^, the structure of the ellipsoid P a F 
is hard to handle (certainly if all the information that one has on /i is that 



it is an isotropic, log-concave measure). And, finally, an attempt to bound 
(jl.ip using the structure of the class F 2 = {f 2 : / € F} directly, without 
linearizing, will fail because P a F 2 is a rather complicated object. 

It would be highly desirable to bound (jl.ip using a deterministic param- 
eter of F, that is, a metric invariant of F that depends on \x and not on 
{Xi)iLn since in many applications (the example mentioned above for one), 
F has a simple structure relative to a natural metric. Thus, our aim here 
is to obtain bounds on (jl.ip that depend on the deterministic structure of 
F as a class of functions on (0,,fi). All we will assume is that F consists 
of functions that have well behaved tails, but may be unbounded, and the 
class may be without a good envelope function. 

It turns out that if one wishes to bound E sup t eF N~ l Y2i=i f(-^i) ~ ^/ 
using a deterministic metric structure of F, one has to consider metrics that 
are stronger than the L p (/j,) ones (see Lemma 13.61 and Remark 13.71 for an 
exact formulation of this observation). More reasonable metrics for such a 
goal are the Orlicz norms ip& for 1 < a < 2. These norms are defined via 
the Young function exp(x a ) — 1 for a > 1 by 

||/||^=mf{ C >0:Eexp(|//ef)<2}. 

It is possible to bound the "linear" process using a natural complexity 
parameter of F that originated in the theory of Gaussian Processes. This 
complexity parameter is defined for any metric space (T, d) and is denoted 
by 72 (T, d) (see the book [36] and Section [2] for its definition and some of its 
properties). Indeed, it is standard to show that 
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where c is an absolute constant, and that a similar bound holds with high 
probability (see Lemma l2.5p . Moreover, as we will explain in Section [3.11 it 
is impossible to obtain such a bound using a weaker ip a metric. 

Unfortunately, if one is interested, as we are, in bounds on the empirical 
process indexed by F 2 using complexity parameters of F itself, a contraction 
type argument only yields that 
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which is unsatisfactory when dealing with a class of unbounded or weakly 
bounded functions that only have nice tails. For such classes, (jl.2p is mean- 
ingless. 



An improvement to this contraction based estimate appeared in [22] and 
later in [23], where is was shown that if F is a symmetric subset of the ^(/u) 
unit sphere (i.e. ||/||l 2 (^) = 1 and if / G F then -/ 6 F), one has 
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1/ || /N x 72(^,^2) 72(^^2) 
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(1.3) 

Thus, the diameter of .F in L^ may be replaced by its diameter in ip2- 

There are many applications that follow from (|1.3|) . For example, it was 
used in [21] to solve the approximate and exact reconstruction problems 
(studied, e.g., in [101 [HI CE2] ) in a rather general situation that includes any 
isotropic, subgaussian ensemble. However, even (|1.3p still leaves something 
to be desired, since it too is meaningless for a large class of natural measures. 
Indeed, consider the volume measure on an isotropic convex body in R n , or 
more generally, an isotropic, log-concave measure on M. n . Again, if we set 
F = {(x, •) : x € S n } - the class of linear functionals of Euclidean norm 
one, it may have a very bad diameter with respect to the ^(/-O norm (as bad 
as \/n), whereas, thanks to Borell's inequality ([8], see also [27]), its ipi(n) 
diameter is at most an absolute constant, independent of the dimension. 
Thus, it seems natural to ask whether one may replace d^ 2 = supj eF ||/||^ 2 
in (|1.3|> . with d l j >1 = snpf eF WfWfa- The main result of this article is a 
positive answer to this question. 

Theorem A. There exists an absolute constant c for which the following 
holds. If F is a symmetric class of mean- zero functions on (fi,/x) then 
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1 X^f2rv\ fff 2 . J, 72(^,^2) 72(^^2) ) /-, A s 

Nl^f ( X i)~ K f < cmax | rf ^i ~7fi . Jj f, ( L4 ) 

i=l { V ) 

and a similar bound holds with high probability. 

A key ingredient in the proof of Theorem A and our second main result, 
deals with the structure of random coordinate projections of a given class of 
functions that have nice tail properties. We will be interested in the growth 
of the Euclidean norm of monotone rearrangements of vectors in P a F: for 
every 1 < m < N consider 

/ m \ 1/2 / \ 1/2 

D m = sup V(v 2 )* = sup max V/ 2 (Aj) 

veP„F\£ J feF \I\=m\*£ J 



where (v*)^ =1 is a non-increasing rearrangement of (|i)j|)^L 1 . We will present 
high probability, sharp bounds on the empirical diameters D m and use them 
in the proof of Theorem A. 

Let us consider a simple example that indicates which bound on D m one 
can hope for. Let /i be the canonical gaussian measure on M. n . Hence, 
if (gi)2 = i are independent, standard normal random variables and G = 
(gi,...,g n ), then for every Borel set A C W 1 , n{A) = Pr(G £ A). Let 
K C W 1 , consider F = {(x,-) : x S K}, the class of linear functionals in- 
dexed by K, and put pQ)^ to be independent, distributed according to jjl. 
Thus, {Xi)f =1 are independent copies of G, and the coordinate projection 
of F is given by P a F = {((j;,Xj))^. 1 : x £ K}. Observe that there exists 
an absolute constant c such that for every 1 < m < N, 

/ m \ 1 / 2 / n \ 

E sup ^(v 2 )* > c E sup Y]giXi + sup \\x\\ P ^ m\og{eN / m) , 

v£P«F\^ J \ x€K^ xeK J 

(1.5) 
where P^ is the Euclidean norm on R n . Indeed, this lower bound is evident 
because the first term is just the case m = N = 1, while the second term is 
an estimate for a single point x £ K which has a maximal Euclidean norm. 
The simple reasoning that leads to (jl.5p gives the impression that the 
estimate is far from sharp. However, it turns out that there is an upper 
bound that holds in considerably more general situations, and that matches 
the lower bound in the gaussian case. The complexity parameter is, again, 
the 72 functional with respect to the ip2 norm, while the term that represents 
the behavior of the "worst" in the class is d a = sup^ e ^ ||/|lv a for 1 < a < 2. 

Theorem B. For every 1 < a < 2 there is a constant c a that depends only 
on a, and absolute constants c\ and C2 for which the following holds. Let F 
be a class of mean-zero functions. Then, for every u > c±, with probability 
at least 1 — exp(— C2ulog N), for every f £ F and every 1 < m < N, 

max V f\X t )) < c a u U(F, i> 2 ) + d^m 1 / 2 log 1 ' a {eN/m) 
|/|=m \7^i J v 



To put this result in the right perspective let us return to the gaussian 
example. If fi is the canonical gaussian measure on W 1 then the ^2 norm 
endowed on M. n is equivalent to the Euclidean one. In particular, for every 
m < N, sup^g^ \\x\\es y / mlog(eN/m) and supj G ^ \\f\\ip 2 \/'mlog(eN/m) are 



equivalent. Moreover, by the Majorizing Measures Theorem (see [36] and 
section [2]) and since the Euclidean and the tp2 metrics are equivalent, so are 
Esup^g^ Y27=i 9i x i an< ^ 72 (-^> 1^2)- Hence, the bound in Theorem B is sharp 
(up to the absolute constants and the exact probabilistic estimate) for the 
class of linear functionals indexed by a subset of W 1 and with respect to the 
gaussian measure. 

Theorem B reveals useful information on the way vectors in P a F look 
like for a typical sample (Xi)f =1 . If N is relatively small, namely, when 
dip a \N <C 72 (F, ^2); ah the information one has is that the Euclidean norm 
of any P a f is at most of the order of 72 (F, 1^2) • Then, for larger values of N 
the situation changes. For every f & F and 

\ = c' a d^log 1 / a (cNdlJ 7 UF,^)), 
the block / = /(/) = {i : |/(Xj)| > A} has small cardinality: 

sup|/(/)|< CQ 72 2 (^2)/A 2 . 
feF 

Outside this block, a monotone rearrangement of any P a f is dominated 
coordinate-wise by a rearrangement of the vector (c a d^p a log ' a (eN/i)). 

Note that the behavior of the "small" coordinates of each P a f is natural 
for a single ip a random variable. Indeed, it is straightforward to verify that 
if v is a ip a random variable and (t>i)^Li is a vector of independent copies of 
v, then with high probability, for every i, v^ < c||v||^, Q log ' a (eN/i). Thus, 
our results show that for a random sample a, the "small coordinates" of 
any P a f are dominated by the typical behavior of a sample of the function 
in the class with the maximal ip a norm. From that point of view, each 
vector in P a F can be decomposed into a "regular" part, which behaves as 
if F has an envelope function whose tp a norm is d^ a , and a "peaky" part, 
which is supported on the block /(/) and is bounded in t^ ■ The blocks /(/) 
take care of the possibility that vectors in P a F have a few "a-typical" large 
coordinates that are due to the complexity of the whole class. 

To formulate a weak version of the decomposition result (the full one is 
presented in Theorem I4.ip we need two preliminary definitions. First, for 
sets ,4,5 c W\ A + B = {a + b : a G A, b £ B} is the Minkowski sum of 
A and B. Second, we denote by Bp the unit ball of £p = (R , || || p ) and 
by B^ln the unit ball of the ip a norm on M. N , when viewed as the space of 
functions on the probability space il = {1, ..., N} endowed with the uniform 
probability measure. 

Theorem C. There exist absolute constants c±,...,Cq for which the following 
holds. Let F be a class of mean-zero functions. For 1 < a < 2 and for every 



N, set 

A = Cl max {^ log^^JV^/^ (F, ^)), l} . 

Then, for every t > C3, wi/i probability at least 1 — 2exp(— C4tlogiV) 7 
P.F C cb< Ui(F,fa)B$ + (A5^ n ced^S^)) • 



Theorem C extends and improves one of the main results from [25] . As 
we will explain in Section HI it also extends an empirical processes version 
of a theorem due to Rudelson on selector processes from [33] . 

Let us turn to the applications of the three theorems described above 
that will be presented here. We will focus on properties of the random op- 
erator T = ^2i = i\Xi, -)ej, considered as an operator between an arbitrary n 
dimensional normed space (M n , || ||) and £p , where (Xj)^ are independent, 
distributed according to an isotropic, log-concave measure \i on W l . 

It is well known that many results in Asymptotic Geometric Analysis 
have been obtained using certain specific random selection methods, most 
often, according to the canonical gaussian measure on R n , or with respect 
to the Haar measure on an appropriate Grassman manifold. These selection 
methods, combined with analogs of Theorem A and Theorem B for those 
models of randomness, lead to geometric information on the structure of 
convex bodies, most notably, to Dvoretzky type theorems and to low-M* 
estimates (see, e.g. [27][3T]). 

We will show that sometimes it is possible to use more general sam- 
pling methods and still obtain similar geometric results. In particular, we 
will show that parts of the classical, gaussian based theory, (e.g. "standard 
shrinking" and low-M* estimates) may be extended to log-concave ensem- 
bles. In fact, the gaussian parameter associated with a convex body K, 
Esup^g^ YHi=i 9i x ii which is used as a complexity parameter in the clas- 
sical gaussian based theory, is replaced in our results by 72 (K, 1^2)- And, 
although the two complexity parameters are seemingly different, it can be 
shown that they coincide if one resorts to the original sampling methods. 

Because of their general nature, Theorems A, B and C have many other 
applications in very different directions, and these will not be presented here. 
For example (out of many), our results can be used to extend the analysis 
from the known cases to other ensembles of the reconstruction problem, 
approximate and exact (see, for example, [TO], [TT1 [T2J [23]), of the statistical 
persistence problem [TU [6] and of various embedding problems. Some of the 



applications are straightforward but others are more difficult, since obtaining 
sharp estimates on the complexity parameter ^2(^,1^2) can be nontrivial. 
To keep this article at a reasonable length and to maintain its focus on 
the structural, empirical processes oriented results, we chose to defer the 
presentation of most of the applications to a later work. 

The article is organized as follows. In Section [2] we will present prelimi- 
nary results and several definitions we will need. Then, in Section [3] we will 
prove Theorem B and Section [J] will be devoted to the proof of Theorem 
C. Theorem A will be proved in Section [5j and in Section [6] we will present 
some applications of the three theorems. 
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2 Preliminaries 

Let us begin with notational conventions. Throughout, all absolute con- 
stants are positive numbers, denoted by c, co,ci,... etc. Their value may 
change from line to line. We use Ko,Ki,... for constants whose value will 
remain unchanged. By A ~ B we mean that there are absolute constants 
c and C such that cB < A < CB, and by A < B that A < CB. For 
1 < p < 00, £p is M. n endowed with the £ p norm, which we denote by || || p , 
and B™ is its unit ball. With a minor abuse of notation we denote by I I the 
cardinality of a set and the absolute value. 

We say that K C W 1 is a convex body if it is a compact, convex and 
symmetric set (that is, if x £ K then — x £ K) with a nonempty interior. If 
K is a convex body we denote by || \\k the norm on W 1 whose unit ball is 
K and set K° = {y : <^x, y) < 1 Vy £ K} to be its polar body. 

Given a probability measure \i and a sample (Xi)fL 1 , we will sometimes 
write -P/v/ = A" 1 ^ i=1 f(Xi) and Pf = E/. Hence, the supremum of the 
empirical process indexed by F is supj e ^ \PnI — Pf\- Given / £ F and 
a C {1, ..., N} we set P a f = (/(X;) W- 



A significant part of our discussion will use basic properties of sums of 
independent random variables that have nice tails. The proofs of the claims 
presented here may be found, for example, in [23], [18] or [37] . 

Recall that a random variable has a bounded ip a norm for 1 < a < 2 if 
there is some constant C for which Eexp(|/| a /C a ) < 2, and in which case 
one sets 

||/||^=inf{C : Eexp(|/r/C a )<2}. 

One can show that there is an absolute constant c such that if / G L^ a , 
then for every t > 1, Pr (|/| > t) < 2exp(— ct a /||/||^ Q ). Conversely, there 
is an absolute constant c\ such that if / displays a tail behavior dominated 
by exp(-t a /K a ) for some 1 < a < 2, then / G L i>a and ||/||^ Q < Cl K. We 
say that X is a subgaussian random variable if ||-X"|U 2 < oo. 

Lemma 2.1 There exists an absolute constant c for which the following 
holds. Let X be a mean-zero, subgaussian random variable and let X\, ...,Xk 
be independent copies of X. Then, for every fixed a = (a±,...,ak) £ M fc 
|| Yli=i a i-^i\\ip2 — c ll^llv2ll a ll2- Thus, for every t > 0, 

Pr ( \Y^aiXi\ > ct||X||^ 2 ||a|| 2 ) < 2exp(-t 2 /2)- 

In particular, if (ei)f =1 are independent, symmetric {—1, l}-valued ran- 
dom variables, then for every {ai)f =1 , 

Pr ( I^OiEil > ct||a|| 2 ] < 2exp(-t 2 / 2 )- 

For sums of independent ipi random variables the situation is more del- 
icate, and one should expect two types of behaviors: an early subgaussian 
decay followed by a subexponential one, as Bernstein's inequality shows. 

Lemma 2.2 There exists an absolute constant c for which the following 
holds. Let X\, ...,X/v be independent copies of a mean-zero random variable. 
Then, for any t > 0, 



Pr 



1 N 

s5> 



N 



> t < 2exp — ciVmin 




This estimate may be extended to other values of a. The next lemma 
is a standard outcome of Corollaries 2.9 and 2.10 from [35] (see [2] for the 
proof). 



Lemma 2.3 Let 1 < a < 2 and let (Xi)f =l be independent, mean-zero 
random variables such that ||Xj|L < A for every 1 < i < N . Then, for 
every (ai)f =1 E M. N and any t > 0, 

Pr [iVaiXil >tA) <2exp(-cmin{^,^— 1 ), 
V ~[ J V INIa HIS* J/ 

where 1/a + 1/a* = 1 and c is an absolute constant. 

Next, let us turn to the main complexity parameter we will use - Tala- 
grand's 72 functional. 

Definition 2.4 ]36$ For a metric space (T,d), an admissible sequence of 
T is a collection of subsets of T , {T s : s > 0}, such that for every s > 1, 
\T S \ < 2 2 and \T \ = 1. For (3 > 1, define the 7^ functional by 



10 



(T,d) =infsup^2 s / /3 (i(£,T s ), 



teT s =o 



where the infimum is taken with respect to all admissible sequences of T. 
For an admissible sequence (T s ) s >o we denote by ir s t a nearest point to t in 
T s with respect to the metric d. 

When considered for a set T C L2, 72 has close connections with proper- 
ties of the canonical gaussian process indexed by T, and we refer the reader 
to |13(I36| for detailed expositions on these connections. One can show that 
under mild measur ability assumptions, if {Gt : t £ T} is a centered gaussian 
process indexed by a set T then 

ci72(T,(i) < EsupG t < c 2 72(T,(i), 
teT 

where c\ and C2 are absolute constants and for every s,t G T, d 2 (s,t) = 
E|G S — Gt| 2 . The upper bound is due to Fernique [J3] and the lower bound is 
Talagrand's Majorizing Measures Theorem |34j . Note that if T C W 1 , (gi)^ =1 
are standard, independent gaussians and Gt = YH=i9iti then d(s,t) = \\s — 
t\\2, and therefore 

)( 
ci72(T, || • || 2 ) < Esup y^Qik < c 2 72(T, || • || 2 ). (2.1) 
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Note that a closely related complexity parameter that is used to describe 
geometric properties of a convex body K is 



M*(K) = [ \\x\\ K oda(x) 
Js™- 1 



IS" 

where a is the Haar measure on the sphere S*" -1 . This parameter is gaussian 
in nature and it is straightforward to verify that 

k 

</nM*(K) ~ E supVftx,. 

It is well known that chaining methods lead to simple bounds on empir- 
ical processes. Indeed, the following result is a combination of a chaining 
argument with Lemma l2,ll or with Lemma 12.21 

Theorem 2.5 There exists an absolute constant c for which the following 
holds. If F is a class of functions on (f2,/i) then for every integer N , 

E sup \P N f - Pf\ < c =— + 

feF \ VN JV 

and 

Esup|PAr/-P/| < C- 



feF VN 

Similar bounds hold with high probability. 

Results of this flavor may be found in Chapters 1 and 2.7 of |36j . 

Finally, in Section [6] we will be interested in isotropic, log-concave mea- 



Definition 2.6 A symmetric probability measure \i on R n is called isotropic 
if for every y G R n , f \{x,y)\ 2 d^(x) = ||y||f. 

We say that a measure [i on W 1 is L-subgaussian if for every x £ M. n , 

\\( X T-)\\lfo(jJ,) ^ L \\( X 1-)\\l2(Ji)- 

The measure \x is log-concave if for every < A < 1 and every nonempty 
Borel measurable sets A, B C R n , fi(XA+ (1 - X)B) > fi(A) X fi(B) 1 - x . 

The canonical gaussian measure on M n is clearly isotropic and subgaussian, 
with L being an absolute constant. Lemma 12 . 1 1 implies that the same holds 
for the uniform measure on { — 1, l} n . 



11 



A typical example of a log-concave measure on W 1 is the volume mea- 
sure of a convex body in R n , a fact that follows from the Brunn-Minkowski 
inequality (see, e.g. [3T] ) . Moreover, Borell's inequality [8l [27j implies that 
there is an absolute constant c such that if fi is an isotropic, log-concave 
measure on W 1 , then for every x G M n , ||(av)IUi — c ll( x '')lli2 = c IMl2- 
There are isotropic bodies with a subgaussian volume measure - for exam- 
ple, isotropic positions of B™ for p > 2 [3j. However, the general situation 
is completely different, and there are many examples of volume measures 
of isotropic convex bodies in M. n for which linear functionals are far from 
exhibiting a bounded V2 behavior. In fact, ||(x,-)||^, 2 may be as large as 
\/n||rc||2 (for example, x = e± and the volume measure on an isotropic po- 
sition of Bf ). We refer the reader to [16] for a survey on properties of the 
volume measure of isotropic convex bodies and, more generally, of isotropic 
log-concave measures on W 1 . 

3 Bounding the diameter 

This section is devoted to the proof of Theorem B. Although we will present a 
complete proof only for a = 1, we will indicate the very minor modifications 
that are needed to prove it for any 1 < a < 2. 

The first step in the proof of Theorem B is to construct a good cover of 
the Euclidean unit ball B% , an idea which was used for a very similar goal 
in the proof of the main result in [1] . 

Definition 3.1 If A,B C W 1 , we denote by N(A,B) the smallest number 
of points X{ G A such that A C IJj( x * + B). 

If || || is a norm on M n and B = {x : ||x|| < e}, then the set {xi} is called 
an e-cover of A with respect to the norm \\ |j. 

Clearly, if B is an e ball of some norm, then N(A, B) is the smallest cardi- 
nality of a set {x^ such that for every a € A, mim, \\a — Xi\\ < e. 

Fix an integer N and define the following sets: for 1 < £ < N/2 put 



A e = lz G B% : |supp(z)| < £, \\z\loo < 1 



Let E£ = £/N, set N( C Ai to be an e^-cover of Ai with respect to the 
£2 norm and let Pj : R — >• M be the orthogonal projection onto the 
space spanned by the coordinates (ej)j e j-, that is, Pix = ^je/( e *' x ) ei - J ^ 
standard volumetric estimate shows that for every convex body K C W N , 
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N(K,eK) < (2/e) n . Therefore, 

\N e \ < Y, NiPjB^eiB?) < ( N ) (-) < eMc nog(eN/£)) (3.1) 
\i\=e ^ ' ^ £e ' 

for a suitable absolute constant cq. 

Fix an integer m < N and assume that m = 2 r ° for some integer tq. 
Define the sets B m as follows: 

B m = Iz € £f : |supp(z)| < m, supp(z) = (J / r , P /r z G JV| Jp | I (3.2) 

where (J r )J!!L^ are disjoint sets of coordinates, |/o| = 2 and for r > 1, 
\I r \ = 2 T (and thus the cardinality of their union is m). 

It is evident that B m consists of vectors in B% that can be written as a 
sum over disjoint sets of coordinates I r of cardinality 2 r , and the projection 
onto each one of the "blocks" I r belongs to the net N\ Ir \, and thus to A\j\. 

It is standard to verify that for every m = 2 r ° , 

ro — 1 ro—1 

\B m \ < \N 2 \ • J] \N 2 r\ < ]J exp(c 2 r log(eiV/2 r )) < exp( Cl mlog(eiV/m)). 

-| In 

Let D m = sup y Gi? supiji =m (X)ie/ f 2 (Xi)) ■ The next lemma shows 
that in order to bound D m it is enough to consider the linearized process 
indexed by F x B m and defined by (/, v) — > Yli=i f(^i) v i- Although Lemma 
13.21 is a purely deterministic result, it is formulated in the "random" context 
in which it will be used. 

Lemma 3.2 There exists an absolute constant C such that for every m < 
N/2 satisfying m = 2 r ° for some integer ro, and for every Xi, ...,Xn, 

N 

D m < Csup sup VV/(Xj). 

Proof. Let m = 2 r ° for some integer ro and assume that m < N/2. If 
v G B% for which |supp(t>)| < m, let (v*)?L 1 be a monotone non-increasing 
rearrangement of (IfliD^Li and put v a (j) = v *j, where a is the suitable per- 
mutation of {1,...,N}. Consider the sets (i r .)£L"o defined as follows: Jo = 
{ct(1),(t(2)} are the largest two coordinate of (|t>j|)^Li, h = {er(3), cr(4)} are 
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the two following that, and so on - I r = {o~(2 r + 1), ...,o"(2 r+1 )} for r > 1. 
Thus, |/ r | = 2 r for r > 1 and |/o| = 2. Since v* < 1/v? then for every j, 

||P/^||oo<l/(lU/-l) 1/2 = 2 " 



■i/2 



and thus P/.u G A 2 3. Let 5 G P m be such that for every 1 < r < ro — 1, 

P/,£ G ^|/ r |= #2r, ||P/ r v - P/ r S|| 2 < 2 r /N and ||P Jo u - P /o v|| 2 < 2/N. 

Therefore, \\v - v\\ 2 < 2/N + Erli 1 2 ' r / N ^ m / iV ' and thus > if we set 
km = {«£ B2 ■ |supp(u)| < m} then 



D, 



sup fe/ 2 (^)J 

/£ F, \I\=m \ ieI J 



1/2 



sup sup y2vif(Xi 

feF, \I\=mv€U m ieJ 



< sup sup Y](vi -Vi)f(Xi) + sup sup y~]vif(Xi 
feF, \i\=mveu m ieI feF, \i\= m veu m ieI 

< (m/N)D m + sup sup V"i)i/(Xj) 

feFveU m ^ 

AT 

< (m/N)D m + sup sup Y\i/(Xj). 

feFv&B m i=l 



^N 



Since m < iV/2 it is evident that D m < 2 supj G ^ sup t , GBm Yli=i v if(Xi), as 
claimed. ■ 



Remark 3.3 Observe that for every 1 < j < ro — 1 and every v G B 2 j+i, 
P\j j r v G B 2 j , a fact which will be used in the dimension reduction proce- 
dure that is needed in the proof of Theorem B. 

We will need two simple observations about sums of centered random 
variables, both of which follow from Lemma 12. II and Lemma 12.31 First, if 
E/ = then for every t > and any / C {1, ..., N}, 



Pr 



!>/(**; 



iei 



>t\\Piv\\ 2 \\fU 2 <2exp(-c t 2 ). 



Second, if E/ = then for every t > and any / C {1, ..., N}, 



Pr 






>*||^HI°o||/lk <2exp(-c |/|min(t 2 ,t)), (3.3) 
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where in both cases cq is an absolute constant. 

Before proving Theorem B we need a few more definitions. Let Ef 
be the collection of all subsets of {l,...,iV} of cardinality £. Note that 
there is an absolute constant kq such that for every integer 1 < I < N, 
ex.p(K £log(eN/£)) > max{|^|, |-B^|}, and define sp to be the first integer 
which satisfies that 2 2 > exp(Ko£log(eN/£)). 

The chaining argument we will use for / — > sup^g^ Yl%=i v if{^i) con- 
sists of three parts. First, when s > s m , the number of vectors in B m is 
much smaller than the number of possible "links" in all the chains, and thus 
no special treatment is needed. In the middle part, when s 2 < s < s m , there 
will be a simultaneous reduction in the level s and in the dimension which 
will be achieved by passing from the set B m to the sets B m i 2 r for the correct 
value of r. Finally, when s < s 2 no further chaining will be required because 
the cardinality of the indexing sets is small enough. 

Let us reformulate Theorem B. 

Theorem 3.4 For every 1 < a < 2 there are constant c a and C a that 
depend only on a, and there exist absolute constants c\ > 1 and c 2 for which 
the following holds. Let F be a class of mean- zero functions and let (F s ) s >q 
be an admissible sequence of F. Then, for every t > c\ and every integer N, 
with probability at least 1 — 2exp(— C2tlog N), for every m < N and every 
/eF, 

N I oo \ 

sup J>/(Xi) <c a t \Y,'2 s/2 hsf-7T s . l fU 2 +d 1 p a V^log l/a (eN/m)) 

veB ™ i=\ \s=0 / 

where d^ a = sup /6F ||/||^, a . 

In particular, with that probability, for every m < N, 

D m < C a t (i2{F,^ 2 ) + d^ a x/^log 1 /" {eN/mj) . 

As we said, we will present the proof of Theorem 13.41 only for a = 1. The 
proof for 1 < a < 2 is identical, with the exception that (|3.3p is replaced 
by an appropriate deviation estimate for ip a random variables, as stated in 
Lemma 12.31 

Proof. Let {F s : s > 0} be an admissible sequence of F and without loss 
of generality, assume that m = 2 r ° for some integer r$. 

To begin the first part of the chaining argument, for every fixed / set 
(A s /)j = (ir s f - n s -if)(Xi). Then, for every f £ F and v £ B m , 

N N N 



S>S Tl 
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Since the cardinality of the set A s = {~K s f — 7r s _i/ : / € F} is at most 2 2 
and since \B m \ < exp(nom\og(eN/m)), then by the definition of s m and a 
ip2 estimate, if t is larger than an absolute constant, one has 

Pr hf e F, v e B m : | Y f>(A s /)i| > t\\v\\ 2 Y 2 S / 2 ||A S /||^ 2 j 

V S>S m 1=1 S>S m / 

<|J5 m | -2 Y |A s |exp(-ci2 s t 2 ) < 2exp(-c 2 2 Sm t 2 ). 



s>s ri 



Now, let us turn to the "middle part" , in which the structure of vectors 
that belong to B m is used. First, consider the integers s m , s m / 2 , etc. From 
the definition of sg it follows that there is an absolute constant c 3 such that 
for every 1 < £ < N, sg satisfies that 

2 s * > c 3 £log(eN/£), 2 s '- 1 < c 3 £log{eN/£). 

In particular, 2^~ 2 < c 3 {£/2) log(eN/£) < c 3 (£/2)log(eN/(£/2)), implying 
that sg — 1 < sg/2 < sg. A similar argument shows that sgu < sg if £ < N/2, 
and thus, either sg/2 = sg — 1 or, if sg/ 2 = sg, then sg/4 = sg — 1. In any 
case, if one considers the sequence sg,sg/2, ....,s^/2 r ; h decreases in steps of 
at most one and remains constant on blocks of cardinality at most two. 

Fix any v S B m , and one may assume that |supp(t>)| = m. Let £ r = 
m/2 r and put Ig 1 C supp(-y) to be a set of m/2 coordinates such that 
\\Pi e f||oo < l/(m/2) 1 / 2 (such a set of coordinates exists by the definition 
of B m ). Denote by J\ the complement of Ig 1 in supp(f), and observe that 
Pj ± v G B m /2 (where, of course, Ig 1 and J\ depend on v). Hence, 

TV 



Y Vii^fXXi) = Y, Vi(*s m f - Ks m/ J)(Xi) + Y v ^s m/ J){Xi), 

(3.4) 



iei ei 



and 
Y v ^s m f)(Xi) = Y v ^s m f-n Sm/2 f){X i ) + Y^s m/ J){X i ). (3.5) 



We will estimate the first part of (|3.4p using a ^2 argument and the 
second one using the tp\ information. Indeed, there are at most 2 2 m 
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elements of the form ir Sm f — 7T Sm/2 f, and at most \B m \ vectors v. Since 
\B m \ < exp(Ko"i<log(eiV/m)) then from the definition of s m it follows that 
with probability at least 1 — 2exp(— C4i 2 2 Sm ), for every / G F and v S B m 



iehj 



>i(*s m f-*s m/2 f)(Xi)\ <t2 s ^ 2 \\P Iii vh\\7r Sm f-7: Sm/ J\\ f2 . (3.6) 



To handle the second term, recall that every v € B m , \\Pi t t>||oo < l/(m/2) 1 ' 2 
and |7^| = m/2. Hence, for every / G F, v £ B m and u > 0, 



TV 



J^ vMs m/ J)(Xi 



^uhs.jw^- 



[m 



/2)V2 



< 2exp(— C5 1 7"^ x | min(u 2 ,u)). 



In particular, ifonetakesu = t2 Sm /\Ie 1 \ (which is of the order of log (eN/m)), 
then by our estimates on the cardinality of B m and the definition of s m / 2 , 
it follows that with probability at least 1 — 2exp(— CQt2 Sm ), for every / G F 
and every v £ B m , 






< td^j Jl y/mlog(eN / 'm) . 



Turning to (|3.5p . the first term can be bounded exactly as in (|3.6p . while 
in the second term of (|3.5p . the required dimension reduction is achieved: 
all the vectors Pj^v belong to B m / 2 and the indexing class is F Sm/2 . 

The same argument can be repeated, by breaking each J\ into Ii 2 and 
its complement in J\ (which we denote by J2), just as in (|3.4p and (I3.5p . 
At the r-th step one begins with vectors Pj r _ 1 v that belong to 73 m / 2 r-i, 
and an indexing set F s r _ 1 . It follows that with probability at least 1 — 
4exp(-c 6 i 2 2 s W2'- 1 ) - 2exp(-c 6 t2 s W2'- 1 ) ) for every / G F and v E J3 m , 

AT 



^( j P Jr _ lV ) l (vr Sm/2r _ 1 /)(X i 
<<2'W^/ 2 || 7 r. m/aP _ 1 /-7r. m/aI ./||^(||Pj ir «|| 2 + ||P Jr «|| 2 ) 



•l/2>" 



N 
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Since 2 s ™l v ~ (m/2 r )log(eN/(m/2 r )) and ||i\.t;||oo < l/{m/2 r ) 1 / 2 , then 
2 s W2 r - 1 ||P r ^ u || 00 < (m/2 r ~ 1 ) l / 2 log(eN/(m/2 r - 1 )). Moreover, ||Pj r v|| 3 + 
H-P/r^lh < 2||Pj r _ 1 «[|2 < 2, and thus the first two terms are bounded by 

c 7 t2 s ^-^ 2 \\7, Sm/2r _J-n Sm/2r fU 2 
+c 7 td i , 1 (m/r- 1 ) 1 / 2 log (eiV/(m/2 r - 1 )) . 

Hence, if we continue in this fashion until s 2 = s m / 2 r -i, it follows that for 
t > cs, with probability at least 



■'■o 



1 - 4 Y^ (exp(-c 6 t 2 2 s -/ 2r - 1 ) + exp(-c 6 t2 s '™/ 2r - 1 )) , 



(3.7) 



r=l 



for every / E F and every v G P m , 



J>(7T Sm /)pQ 

<c 9 £ (£ (2 s -/^/ 2 || 7 r Sm/2r _ 1 /-7r Sm/2r /||^) +^ lV ^log(eiV/m) 

AT 

^(Pj ro _^)i(7r S2 /)(^ 



+ 



4 = 1 



Observe that the elements of the sequence (•s m /2'-)r=i belong to the interval 
[■si,s m ]. Also, this sequence decreases in steps of at most one, and each 
integer is repeated at most twice. Hence, 



'■(i 



£2^W2|| ^ - *s m/2 rfU 2 < 2 J^ 2 S/2 ||A S (/)|| 



021 



r=l 



and the probabilistic estimate in (|3.7p is at least 1 — 2exp(— c\o2 S2 t) > 
1 — 2exp(— cut log N), because t > eg. 

Finally, for the last step, consider the sets supported on at most two 
coordinate, and thus log|P S2 |, log | Z?2 1 ^ logiV. Therefore, with probability 
at least 1 — 2 exp(— epilog iV), for every / G F and v £ B m , 



N 



J2( P Jr ^U^J)(Xi 



i=l 



< cistd^ log N. 
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Summing the three parts, if follows that for every t > C$ and for every 
m < N, with probability at least 1 — C\ exp(— C^ilog N), for every / G F 
and every v G B m , 



N 



!>/(*] 



i=\ 



< C 3 t(j2 2S/2 W A ^h2+d^V^log(eN/m) 

\s=l 



Since there are at most N possible values of m, the same holds for all m < N 
uniformly, as claimed. ■ 

Theorem B can be extended to other £ p norms. Indeed, for 1 < p < 2 
and any / C {l,...,iV}, ||x||^2 < |/| ' p ~ ' 2 ||x||2- Hence, with probability at 
least 1 — 2exp(— ciilog N), for every / G F and I C {1, ...,iV}, 

fe|/| p (*i)J <c Q i(72( J F,^)|/| 1/p - 1/2 + ^J/| 1/p log 1 /«( e Ar/|/|)). 

(3.8) 
For p > 2 let mo be the smallest integer for which 

72(^2) < ^ a Vm log 1 /" (eN/m). 

Then, by Theorem B, for every |7| < mo, 

lEimxA ^ (ei/i 2 (^)) < 2^72(^,^2). 

For larger values of |/|, if we denote (ui)fL 1 = {f{Xi))f =1 then for j > mo, 

/ i \ 1/2 

u* < ( -. £> 2 )* J < 2c a t^ Q log 1 /«( ei V/j). 

Hence, by the triangle inequality, for |/| > mo, 

fe|/PWj < CQ , p t( 72 (F,V2) + ^J/| 1 ^log 1 /«( e iV/|/|)) . (3.9) 

Let us mention that the estimate for p = 1 (the weakest of all the estimates 
for 1 < p < 2) was proved in |25] using a simpler chaining argument. 
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3.1 Optimality 

We begin this section by recalling the observation made in the introduction, 
that Theorem B is sharp when F is a class of linear functionals on W 1 and 
fi is the canonical gaussian measure on M. n : 

Lemma 3.5 There exists an absolute constant c for which the following 
holds. Let K C W 1 , set F = {(x, •) : x € K} and put fx to be the canonical 
gaussian measure on M. n . Then, for every integer N and any 1 < m < N , 



E sup max V / 2 (X t ) > c ( 72 (F, ifo) + d^ 2 v / mlog(eiV/m)) . 

feF\i\=m\jg J V ) 

Although Lemma [3 . 5 1 indicates that Theorem B cannot be improved, one 
might argue that it is a somewhat degenerate case, because of the equivalence 
between the ip2 norm and the L 2 one. The next lemma shows that in general, 
one cannot replace the ■02 norm in the 72 term with any other ip a norm for 
a <2. 

Lemma 3.6 There exists an absolute constant c\ for which the following 
holds. For every integer N , 1 < a < 2 and a number R, there is a probabil- 
ity space (r2,/i) and a class F consisting of mean-zero functions on (0,//), 
such that if {Xi)f =l are independent, distributed according to n, then with 
probability at least c\, 



sup 
feF 



N 



>Rj 2 (F,^ a )VN 



and in 



particular, sup /eF (J2i=i f 2 ( x i)) > R^2{F,ip a ). 



Remark 3.7 As we indicated in the introduction, Lemma \3.® shows that 
in general, Esupj e ^ \N~ 1 Y2i=i f(-^-i) ~ ^/l cannot be controlled using a 
weaker deterministic parameter than 72 (-F, 1P2)/ \N. 

For the proof of Lemma 13.61 we need the following formulation of the 
Paley-Zygmund inequality [18]. 

Lemma 3.8 Let Z be a random variable. Then, for every q > p > 1 and 
0< A< 1, 



Pr(\Z\>\\\Z\\ Lp )>((l-\n(\\Z\\ Lp /\\Z\\ Lq ) p ) 
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p\i/(<i-p) 



Proof of Lemma l3.6l Fix 1 < a < 2 and an integer n. Let Fbea symmet- 
ric random variable with density c a exp(— |t| a ) and set X = (Y\, ..., Y n ) £ R n , 
a vector of independent copies of Y. Consider the probability space (M n , /i), 
with \x defined by p,(A) = Pr(X £ A), let (ej)" =1 be the standard basis of 
R n , set K = {ei/yJ\og{i + 1) : 1 < i < n} and put 

F = {( ei , •)/ y/]og(i + 1) : 1 < i < n}. 

One can show (see, for example, Proposition 7 in [5]) that if (xj)™ =1 is 
nonnegative and non- increasing, then for every p > 1, 



Y^ x i Y i\\L p ~ P 1/a \\{xi)i< p \\ a * + V^ll(»t)t>p[|2) 



(3.10) 



i=l 



where || || Q * is the ^ a * norm for a* satisfying 1/a + 1/a* = 1. Since a < 2 
then a* > 2 and thus || Y^i=\ x i^i\\L p < QlP ' a ||x||2- In particular, for every 
x £ M n , || (x, -)||^ a < c 2 ||x|| 2 . Moreover, \\{x, -)\\^ a > c 3 ||x|| 2 , implying that 
the (.2 an d the ipain) norms are equivalent on M n . 

It is also straightforward to show that there is an absolute constant 04 
such that if (gi)^ 1 are independent, standard gaussian variables then for 
every m, 



E max 



!l> 



< C4. 



l<i<m Y / log(i+T 

Therefore, by the Majorizing Measures Theorem 
l2(F^ a ) < c 2l2 (K,t 2 ) < c 5 E max 



i<*<" yOog (i + 1) 



< 



Cq- 



On the other hand, fix TV" to be named later and consider q > p > 
N. Observe that by (|3.10p . for these values of q,p and N, if (Yi)iLi are 
independent copies of Y then 



N 



E« 



p l/a N l-l/a and 



A' 



E« 

*=1 



^/"JY 1-1 / . 



Let -X"i,...,Xjv be independent copies of the random vector X, set Yij to 
be the j-th coordinate of Xi and put Zj = Y2i=i Yi,j- Applying the Paley- 
Zygmund inequality, it follows that there are /3 > 1 and cj, both depend on 
a, such that if p = cj log n and q = j3p, then for every j, 



rl-l/a 



Pr{\Z \ > c 8 (lo g y a n)N 1 - 1 / a /2) = Pr (\Zj\ > \\Zj\\ Lp /2) > 1/n. 
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Hence, by the independence of (Zj) 1 j =1 , 

Pr (ai <j<n, \Zj\ > CfiN 1 ' 11 a log 1 /a n\ > c 9 . 
In particular, with that probability, 



sup 
feF 



N 



£/(* 



i=l 



max 



N 

E 



iV 1 °g(j + 1 ) 



Xi 



max 



l<j<n ^log(j + 1) 



N 



5>u 



i=l 



>ClQ 



TV 1 - l / a log 1 /" n > /\ ogn 



y/log(n + 1) 



1/0-1/2 



All that remains now is to find the connection between N and n, where 
we already assumed that p = cj log n > N. Clearly, if iV <C logn then 
(\ogn/N) ' a ~ ' can be made to be arbitrarily large by increasing n, as 
claimed. ■ 



4 Decomposing F 

Here, we will present a decomposition of F into the sum of two sets, repre- 
senting its peaky and regular parts. We will show that for every N, one can 
truncate functions in F at the level 

A ~ d^ \og l ^(cd^ a N l / 2 / l2 {F^ 2 )). 

The resulting unbounded or peaky part of each / £ F has coordinate pro- 
jections with a well behaved l 2 norm and short support. On the other 
hand, the regular part of / is bounded in L^ by A, and, moreover, its typ- 
ical coordinate projection is contained in cd^ a B^N . Thus, the regular part 
of F behaves as if F has an envelope function W{x) = supj- g p 1/(^)1 with 

\\ W \U a < d a- 

This decomposition gives a hint of why it is reasonable to hope that the 
supremum of the empirical process supj GF l-P/v/ 2 — Pf 2 \ is wen behaved. 
Although the peaky part of F exhibits no concentration, its £ 2 diameter 
is small, and thus there is no need for cancelation to control it. Since the 
regular part of F behaves as if it has a reasonable envelope function, powers 
concentrate around their mean uniformly. 

To formulate the decomposition theorem (which implies Theorem C) we 
will use the following observations. Recall that if x £ M^ then for 1 < a < 2, 
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||x||^ = inf{C : iV- 1 X)iliexp((|x i |/C) a ) < 2}. It follows that for every 

x G R N , 

x* < c\\x\\ 4 ,n log^ieN/i), 

and, in fact, this behavior of a monotone rearrangement characterizes the 
ipa norm. It is also standard to verify that if X is tp a random variable 
on (O, /x) and {Xi)f =1 are independent copies of X, then for t > cq, with 
probability at least 1 — 2exp(— £ a logiV), for every i, 

X*< Cl t\\X\\^ a log x / a (eN/i). 

Hence, a typical coordinate projection of an independent sample of a single 
function / G L^ a satisfies that with high probability, \\{f {X^f^W^N < 
c 2\\f\\ip a - I n what follows, given v = {f{Xi))fL 1 we will sometimes denote 
the random norms ||/||^iv and ||/|Ljv by ||u|| p and ||v|Liv respectively. 

Theorem 4.1 There exist absolute constants Co, ■■•, C7 for which the follow- 
ing holds. For any 1 < a < 2 and an integer N set 

A = c ^ Q max {log 1 /" (cod^N/^F, ^)) , l} . 

For any t > c\ there are sets F\ and F 2 that depend on N , A and t such that 
F C F\ + F2, and with probability at least 1 — 2exp(— C2tlog N), 

1. sup /eFl \\f\\(N < c 3 tj 2 (F,ip 2 ), sup /GFl |supp(P CT /)| < c 3 ^(F,ip 2 ) /A 2 , 

and su P/eFl E|/| 2 < c zl 2 {F^ 2 )/N . 

2. sup /eF2 \\f\\ Loo < At and sup f£F2 \\f\\^N < atd^ a . 

3. For every u > C5, with probability at least 1 — 2exp(— cqu 2 ), one has 

su P/eF2 \P N f 2 - Pf 2 \ < c 7 ut\ l2 (F^ 2 )/^N . 

The proof of Theorem UJ] requires all the information we have about the 
structure of the set P a F = Uf(Xi))f =1 : / G f\ C R N . Our starting point 
is the next observation. 

Lemma 4.2 There exist absolute constants c\, c 2 and C3 for which the fol- 
lowing holds. Let v G R N for which there are A, B and 1 < a < 2 such that 
for every I C {1, ..., N}, 

\J2 v i) ^ A + B V\I\log 1/a (eN/\I\) . 
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IfP> ciB max{log 1/a {c 2 NB 2 /A 2 ), 1} and E p = {i : \v t \ > /3}, then 

( a A 2 ~\ 

\Ep\<maxl— r ,eNexp(-(p/2B) a )i and (£ v 2 ) 1 ' 2 < c 3 A. 

^ f J ieEg 

Proof. Clearly, for every integer n, \\x\\gn, < y/n\\x\\gn. Hence, for every / C 

{1,...,N}, EieM <^4^T + J B|/|log ll/Q (eiV/|/|). Let E fi = {% : \ Vi \ > /3} 
and note that 

P\Efi\ < ^2 H <A\Ep\^ 2 + B\E p \]o^/ a (eN/\E p \). 

i&Ep 

If B\Ep\\og l/a (eN/\Ep\) < P\E p \/2 then \E P \ < AA 2 /f3 2 . Otherwise, if the 
reverse inequality holds, then \Eq\ < eN exp(—((3/2B) a ). Thus, 

( 4A 2 *} 

\{i : \vi\ > 0}\ < max|-^,e^exp(-(/3/2 J B)")| . (4.1) 

To complete the proof, let (3 > c ± B m&x{log 1/a (c 2 NB 2 /A 2 ), 1}. Therefore, 
\Ep\ < A 2 J ' j3 2 , and thus, for our choice of /3, 

\ 1/2 

£ v 2 \ <A + BJ log 1 /" (eN(3 2 /A 2 ) < c 3 A. 



Proof of Theorem 14. 1L Fix t > cq and recall that by Theorem B, with 
probability at least 1 — 2exp(— ciilogn), for every v G P a E the assumptions 
of Lemma 1431 hold with A ~ £72 (-F, ^2) and B ~ td^ a . Just as in Lemma 
4T21 set 

/3 ~ Bmax{log 1/a {c 2 NB 2 /A 2 ), 1} = At. 

Let </>(/) = sgn(/)min{|/|,/3} and </>(/) = / - 4>(f), put F 1 = {>(/) : / G 
F}, F 2 = {(£(/) : / G F} and observe that F C Fi + F 2 . 

Let us consider F±, which is the unbounded part of F. Note that for 
every / G F, if we set u* = (^(/)) (Xj), then {i : \ Ui \ > /3} C {i : |/(Xj)| > 
/?} = Ep, and on that set \m\ = |/(Xj)| — /3. Hence, by Lemma 14.2} 

1/2 
jeE B 
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Also, since ||/||^ a < d^ a then by integrating the tail, one may verify that 

E|V(/)| 2 < c 4 /3 2 exp(- C5 (/3/^ Q n < c 6l 2 (F, fa)/N, 

proving the first part of the claim. 

Turning to the second part, note that if / € F and w = P a ((fr(f)) then 
I \w\ |oo < /3. Let m = A 2 //3 2 and observe that 

72(^2) ~ d^ a ^max{log 1 / a (eN/m),l}. 

First, assume that m < N. Therefore, since /3 < cjtd^ a log ' a (eN/m) then 
for every j < m, w* < f3 < cjtd^, a log ' a (eN/j). Moreover, if j > m then 
by Theorem B, 

1/2 

< t (j 2 (F, i> 2 ) + d^yfj ; l 0g l/a (eiV /j) 



Therefore, 




<^ Q yjlog 1/a (eiV/j). 



W 3 ^ 



j \ 1/2 



i=l 



and thus, supj g ^ 2 ||/||^,jv < id^ Q , as claimed. 

On the other hand, if m > N then A ~ d^ a . Hence, supj ei r 2 H/Hl^ < 
P ^ td^ a , implying that sup /GF2 ||/|| v ,/v < td$ a . 

It remains to estimate the supremum of the empirical process indexed 
by |0(/)| 2 . Since <fi(x) = sgn(x) mm{x , {3} is 1-Lipschitz, then for every 

/i,/ 2 € i 7 , ||0(/i)| 2 -|0(/ 2 )| 2 | < 2^1/1 - / 2 | pointwise. In particular, 
|0(/l)l ~~ l<H/2)| < 2/3||/i — y*2 1 1 ^ 2 - Therefore, by a standard chain- 

ing argument, for every u > eg, with probability at least 1 — 2exp(— ciqu 2 ), 
su PheF 2 |-f/v^- 2 ~~ Ph 2 \ < cnu/372(F, -02)/vjV, as claimed. ■ 

Theorem 14.11 should be compared with a result due to Rudelson (see 
|33|). Although Rudelson's result was formulated for selector processes, an 
analogous result holds for empirical processes and with essentially the same 
proof as the original one. 
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Theorem 4.3 For every < 5 < 1 there is a constant c(S) for which the 
following holds. If F is a class of mean-zero functions then there are sets 
F\ and F2 such that F C F\ + F2, and with [i -probability at least 1 — <5, 

N 

sup ll/IU < c(S)VNd L2 , sup ||/|U <c(<5)Esup| y>/pQ)l- 
feF-, 2 feF 2 1 feF ~ 

In particular, with probability at least 1 — 5, 

P a F C c(5) (r n VnB? + d L2 VNB^ , 
where 

1 N 

R N = ^=Esup \S2eif(Xi)\ and d L2 = sup(E/ 2 ) 1/2 - 
VN feF fr( feF 

Let us compare Theorem 14.11 with Theorem 14.31 First, observe that the 
first two parts of Theorem 14.11 imply that 

P a F C c{5) (72 (F, ^ 2 )B$ + \Bg n cd^B^ 

and that for large values of N, that is, when A > 1, it is evident that 
72(^^2) < V~Nd^ a . In particular, since B^n C ci^/NB^, Theorem 14.11 
implies that for large N, 

P a FCc 2 (S)d i>a VNB 2 w . 

Hence, if we are in a situation where the ip a and the L 2 metrics are equiv- 
alent, then Theorem 14.11 is stronger than Theorem 14.31 since the yNB^ 
component is not needed. 

In fact, the gap between the two results can be considerable. As an 
example, let F = {(x, -\ : x £ S n_1 } and set \i to be the canonical gaussian 
measure on W 1 . Then, by Theorem 14. H with high probability 

P a F C c{8) (V^JSf + \B^ n cB,^ 

On the other hand, since 

N N 

^=Esup I £>/(Xi)| ~ -t=E(X; ll^ll|0 1/2 ~ Vn~, 
VN f eF ^ VN frt 
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then Theorem 14.31 only yields that 

P a F C c{5) (Vn~NB? + VNB? 



which is a much weaker estimate. 

The reason for the gap between the results is that Theorem 14.11 is tai- 
lored for situations in which one has additional information on the tails of 
functions in the class, and in return gets more structural information on the 
peaky part of coordinate projections. On the other hand, the assumptions of 
Theorem 14.31 can only give little information on the peaky part of F, which 
is captured by the v^ component of that decomposition. Indeed, at best, 
for a fixed, reasonable class of functions F, one may expect that 



1 



Esup 

'N f€F 

Thus Theorem 14.31 only yields 



N 



X>/( x *) 



i=l 



< <F). 



P a FCc(5)(c(F) 



NB? + d u 



NB$ 



but with no further information on the way the coordinates are distributed 
in the i 1 ^ component of the decomposition. In the geometric applications 
we are interested in, the constant c(F) grows with the "dimension" of the 
class (in the example presented above, c(F) ~ \fn), making Theorem 14.31 
too weak for the analysis of such problems. 

We end this section with a formulation of a simple application of the 
proof of Theorem 14.11 

Corollary 4.4 There exist absolute constants cq, c\, c 2 and C3 for which 
the following holds. Let F be a class of mean-zero functions and for every 
N set A = cod^ max{log(cod 1 p l N 1 ' 2 /^{F, 1^2), 1)}- Then, for every t > c\, 
with probability at least 1 — 2exp(— c 2 minjilogiV, t 2 }), 



sup 
feF 



1 N 



Ef 



,2 I ,72(^2) 7 2 2 (^ 2 ) 

< c^t max-; A- 



N 



It is important to note that using the L^ bound to obtain a concen- 
tration result for F2 (as one does in Corollary 14. 4p leads to a logarithmic 
looseness. Indeed, to obtain the correct estimate on the expectation of 
supj e ^ |-P/v/ 2 — Pf 2 \ one has to truncate functions at a level ~ d^. This is 
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impossible even if one considers a single gaussian random variable. It is true 
that for small values of iV - when d^N 1 ' 2 <C 72 {F, ^2), the level of trunca- 
tion is the required one, but the resulting estimate on supj G ^ l-P/v/ 2 — Pf 2 \ 
is trivial. Indeed, for those values of TV" there is no real concentration and 
the bound reflects an estimate on the empirical diameter supj e ^(-P/v/ 2 ) 1 ' 2 . 
On the other hand, when d^N 1 ' 2 ~ 72(L, 1P2) and beyond, one starts seeing 
true concentration, but then the best possible level of truncation for those 
values of N is off by a logarithmic factor from the required one. Thus, even 
with a sharp decomposition theorem at our disposal, a contraction based es- 
timate on the empirical process indexed by F 2 leads to a superfluous log N 
factor. Despite that, this type of a decomposition argument is strong enough 
for many applications (see, for example, [£l[15j[25], and most notably, in PQ), 
because in those cases the all the required information is when d^yN is 
proportional to the complexity parameter of the class, rather than for larger 
values of N. 

If one wishes to obtain the correct estimate on sup t &F |P/v / 2 — Pf 2 \ f° r 
larger values of N, more accurate information on the "bounded part" of 
F is needed. This is not surprising because decomposition theorems like 
Theorem 14.11 are based solely on deviation estimates and on bounds on the 
£2 norms of monotone rearrangements of {f{Xi))f =1 . On the other hand, the 
correct rates require some sort of "local" concentration bounds, and those 
are at the heart of the proof of Theorem A. 

5 From a bounded diameter to concentration 

Here we will remove the superfluous logarithmic factor and prove Theorem 
A, by showing if F is a symmetric class of mean-zero functions, then with 
high probability and in expectation 



sup 



i£/ 2 ™^/ 2 s — K^. 2 ^}- <"> 



In particular, in the non-trivial range where there is actual concentration, the 
dominating term is d^^iF, 1P2) / vN , which is a contraction type estimate 
with the maximal norm in ipi taking the role of the maximal norm in Loo. 
The source of difficulty in the proof of Theorem A is that the desired 
concentration does not follow from the individual concentration of each 
iV -1 Yli=i f 2 (Xi) around its mean. Rather, it is a combination of two com- 
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ponents. First, a tail estimate on the diameter of the "ends" of chains 



Ms u 



1/2 



*T*/) 2 (*i) 



whose role in the chaining process is to capture the "peaky behavior" of 
F that prevents concentration. The second component is an analysis of 
the Bernoulli process sup^ G ^ Ylt=i £ i( 7T T N f) 2 {Xi), conditioned on {Xi)f =1 . 
It captures the part of F in which there is concentration. Moreover, the 
analysis of both parts has to be carried out without resorting to a "global" 
contraction argument, because the L^ or tp2 diameters of the relevant sets 
may be too large. 

As a starting point of the proof of Theorem A, consider an almost optimal 
admissible sequence of F with respect to the V2 metric. Let tat be the 
integer s satisfying that N/2 < 2 s < N. One can show [21] that with high 
probability 



SUP N 



1 N 



*=i 



N 



which is of the desired order of magnitude. This estimate is based on Bern- 
stein's inequality, which implies that (N~ 1 ^2 i=1 h 2 (Xi)) 1 ' 2 behaves like a 
sum of i.i.d. ^2 random variables for "large" deviations. 
Next, one has to study 



sup 
feF 



1 N 
N ^ 



(vr Tjv /) 2 (X l )-E(7r Tjv /) 



which, by a symmetrization argument behaves like 

N 

N 



sup 
feF 



1 N 



i=l 



To analyze this Bernoulli process one uses a chaining argument with the 
same, non-random admissible sequence, and thus one has to study the in- 
crements 



Pr F 



N 

t=i 



i ((vr s /) 2 - (vr,.!/) 2 ) (X,) 



>t 



(5.2) 



conditioned on (Xj)^. At every level s < tn one has to control the 2 
vectors in R N of the form (yi)f =1 = ((vr s / - 7r s _i/)(7r s / + 7r s _i/)(Xj)) 



N 
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Since ir s f £ F and thanks to Theorem B, one has very accurate infor- 
mation on the coordinate structure of ((7r s / + 7r s _if)(Xi))f =1 . However, a 
similar result is required for the differences ((vr s / — it s _if)(Xi))f =l which 
takes into account the ip2 distance between ir s f and 7r s _i/. The desired 
estimate is proved in Lemma 15. II below. 

Finally, to bound (|5.2p . observe that for every £ < N and every (yi)jL l , 

N 



Pr 



^ZiVi 



i=l 




>£y* + \/* E(y 2 )* <2exp(-i 2 /2). 



This observation is used at the level s of the chaining process for t ~ 2 s 
and for different values of £ that depend both on s and on the structure of 
each {yi)f =l = ((iT s f - 7r s _i/)(7r s / + TT s _if)(Xi))f =l . The crucial point in 
determining £ is the number of coordinates on which ((vr s / — 7r s _i/)(Xj)) i=1 
does not "behaves regularly" in the sense of Theorem 14.11 

We begin the proof with a "local" version of Theorem 13.41 - for a finite 
class H, in which for every h 6 H the bound on (X^gi ^ 2 (^Q)) is given 
using ||/i||^ 2 ^log \H\ and \\h\\^, 1 rather than using the global parameters 
72 (H, 1P2) and d^ that are used in Theorem B. 

Lemma 5.1 . There exists absolute constants c\,C2, C3 and c^ for which the 
following holds. Let H be a class of mean- zero functions and set k = log \H\. 
Then, for every u > c\, with •probability at least 1—2 exp(— c% max{A:, log N}u), 
for every I C {1, ..., N} and every h € H , 



[Y,h 2 (Xi)) <C4«(||/i||^>/fc + ||h||^v^log(eJV/|/|) 
Kiel 

An analogous result holds for anytp a norm for 1 < a < 2, with log ' a (eN/\I\) 
taking the place o/log(eiV"/|I|). 

We will prove the lemma for a = 1 since this is the only case we will actually 
use. The proof for 1 < a < 2 follows the same lines and is omitted. 

The proof of Lemma ET] is very similar in nature to the proof of Theorem 
13.41 and will use its notation. Again, we will denote by E m the collection of 
subsets of {1, ..., N} of cardinality m. 
Proof. Recall that for every h £ H, 

\ 1/2 TV 

h'iXM <C sup J2vih(Xi) 
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where B m was defined in (|3,2p . First, assume that \H\ > N and that m = 2 r ° 
satisfies that Komlog(eN/m) > maxjlog \E m \, log|-B m |}. We can assume 
without loss of generality that logl-EV^I > log|i?|. Indeed, if log|-E m | < 
log \H\ then the required estimate follows easily from a ip2 estimate and the 
union bound, since log(|i7| • \B m \) < Co log |-H"|. 

Recall that for every v G B m , |supp(f)| < m and that there is a set 
J^j of cardinality m/2 such that \\Pi e f||oo < l/(m/2) 1 ' 2 . Let J\ be the 
complement of Ii x in |supp(u)|, and so on for £ r = m/2 r , r < n, where 
ri will be named later. Observe that for every u G Z? m , Pj r v G i^ r . Since 
maxjg/^, Htij/ipQ)!!,^ < H/ill^/x/^, then by Bernstein's inequality, for every 
u\ larger than an absolute constant, 



Pr \3h G H, v G B m , r < ri 



E **>»(*: 



ie/f. 



>«i||fe|kV^Iog(dV/4) 



< 2|# | ^ \B ir \ exp(-cini4 \og(eN/e r )) 

r=l 

< 2|#| exp(-c 2 ui^ ri log(eiV/4 1 )) = (*). (5.3) 
Since £ r = m/2 T , we set 7*1 to be the largest integer for which 

(m/2 ri )log(eN/(m/2 ri )) > \og\H\ 

and since log|.E m | > log \H\ such an integer exists. Thus, for u\ > C3 it is 
evident that (*) < 2exp(— am log|iJ|). 

Next, for every v G -B m consider the projection Pj r v. Since \\Pj r v\\2 < 1 
then || ^2iej ri v iK x i)\\i>2 ^ c 5lNlv> 2 - Therefore, 



Pr \3v€ B m , he H 






> U2\/log\H\ 



V 2 



< 2 \ H \ ■ l^ m /2''i|exp(-c 6 'u^log|if|) < exp(-c 7 U2log|iJ|), 

provided that U2 > eg- 

Therefore, if u is sufficiently large, then with probability at least 1 
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2exp(— cgulog |.ff|), for every h € H 

N 

veB, 



sup y~]vih(Xi) 

<eB m . =1 

(n 
||/i||^ V / fog^+ IHk J^ s/T r ]ag(eN/l 

<c u u M|/i||^ 2 v / log-ff + 1 1 /i 1 1 ^ Vm log (eiV/m 



Since 1 < m < N and \H\ > 2V, the claim holds for any such m. 

Now, assume that |i?| < N. Then, set r\ = r^, and by (|5.3|) . with 
probability at least 1 — 2exp(— c^wlog N), for every h € H and v £ B m , 

^2i = ±Vih(Xi) < cr3u\\h\\^ 1 y/m\og{eN / m) . Again, summing the probabili- 
ties over every 1 < m < N the claim follows. ■ 

Recall that t^ is the integer s for which N/2 < 2 s < N. The sets H 
we will be interested in are the sets of links at the level s, namely, A s = 
{A s (/) = ir s f - 7r s _i/ : / e F}, for s < t n , where (F s ) s > is an almost 
optimal admissible sequence of F with respect to the ip2 norm. 

Let us summarize the information we have on the set 



'((A s (f))(X t ) ■ (ttJ + ns-MXi))^ : / € F} . 
Consider the following events: let 



A t =UX i )fL 1 :VIc{l,...,N}, (5.4) 

sup(j2f(Xi)) <Kit(yi(FM + d^y/\r\]Dg{eN/\I\)) 1, 

and 

B} = J (Xi)f =l : V/ E F, V/c{l,...,iV}, (5.5) 

^(A S /) 2 (X,)J < Kl t(2 s / 2 ||A s (/)||^ + ||A s (/)|| v , lV 1^1og(eiV/|/|))| 
where K\ is a suitable absolute constant. 
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By Theorem 13.41 for every t > c\, Pr(At) > 1 — 2exp(— C2ilog N), while 
applying Lemma 15.11 it is evident that for every t > C3 and every s < tjv , 
Pr(B t s ) > 1 - 2exp(-c 4 max{2 s ,logiV}t). 



Let us consider the Bernoulli process 



E£i *((**»/) W-tfM) 



conditioned on the set Of = ^4j n I Pls<r ^1 ) ■ Observe that on Qt we 
have enough information to identify the cardinality of each set of "large" 
coordinates of individual functions A s (/), and thus the point from which 
each vector ((A s (f))(Xi))^L 1 behaves regularly. The parameter we will use 
to identify the point from which the regular behavior begins is 

m(A s (/)) = minjm : 2 S / 2 ||A S (/)||^ < ||A S (/)||^ v / ^log(e^/m)} . 

Theorem 5.2 There exist absolute constants ci, 02,03 and C4 for which the 
following holds. If /o € F then for every t > c\, u > c 2 and (Xi)^ =1 G £lt> 



Pre \3f€F, 

where 



N 



2>((W) 2 (*0 -/»(**)) 



i=l 



> upt < 2exp(— C3M 



Pt = c A t 2 (yNd^ 2 (F, xl) 2 ) + 7 f (F, ^2) 

For the proof we will need the following definition. 

Definition 5.3 Let u = (ui)f =l , I C {1,...,N} and v = (i>i)i=i- We say 
that v dominates u on I if for every i E I, (Pjtt)f < v* . In other words, if 
a monotone rearrangement of Piu is smaller than that of v coordinate-wise 
on I. 

Proof. Fix f €F and for every / e F TN write f - / 2 = El=iM) 2 - 
(7r s _i/) 2 . Let 1 < s < tn and consider a link (vr s /) 2 — (7r s _i/) 2 . Set 
h- = 7r s f - 7r s _i/, h + = max{ir s f,TT s _if} and for every (X i )^ 1 let V- = 
(h-(Xi))^ =1 and v+ = (h + {Xi))f =l . Also, for every h-, recall that m(h-) 
is the smallest integer such that 2 s ' 2 ||/i_||^ 2 < \\h-\\^ 1 y/m\og(eN/m) and if 
the smallest is m > N, set m(h-) = N. 

Since (X;)^ £ B| then for every I C {1, ...,iV} 

1/2 

< Kit (2 s / 2 ||/ l _||^ 2 + ||/i_||^ lX /jT| log(eiV/|/|)) , 
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and let us consider two cases. The first is when m(h-) < 2 s and the second 
is when the reverse inequality holds. 

To handle the first case, when m(/i_) < 2 s , observe that by the subgaus- 
sian inequality for Bernoulli sums, for every u > 0, with probability at least 
l-2exp(-ciu 2 2 s ) 

N 



Y,£i((Ksf) 2 ~ (Ks-lff 



i=X 



< £}(«-«+)? + I Y, £ i( V - V +)t\ 
i=l i>2 3 

1 /9 

i=l \i>2 s ) 

where, as always, {x*)i>\ denotes a non-increasing rearrangement of (|xd)j>i. 

Clearly, E^-^)* < (E^iC^)?) 1 ^ (ELK)*)^ . Since (v+) t < 
max{|(7r s /)(X 4 )|,|(7r s _ 1 /)(X i )|} and (Xi)f =l E A t then 

(2 a \ 1 / 2 I \ 1/2 

X)(4)< <2 sup sup ^/ 2 (X 4 ) 

<c 2 t ( 72 (F, ^ 2 ) + ^2 S / 2 log(eiV/2 s 



Also, because m(/t_) < 2 s and (-Xj)^Li € B|, it is evident that 

(2 s \ 1 / 2 

5>i)?j <t«! (^M*, + ||/ l _||^2 8 / 2 log(eiV/2') 

<c 3 t||/i-|U 1 2 s/2 log(eiV/2 s ). 
Hence, recalling that 

2*/ 2 ||Mk = 2 S / 2 ||A S (/)||^ < 2 S / 2 ||A S (/)||^ < T2(FM, 
one has 



X>-«+)i 



i=i 



<c 4 t 2 ( 72 (i ? ,V'2)||/i-IU 1 2 s/2 log( e iV/2 s ) + £^ 1 ||/i_||^ 1 2 8 log 2 (eJV/2»)^ 
<2c 4 t 2 d v , l7 2(i ? ,V'2)2 s/2 log 2 ( e iV/2 s ). (5.6) 
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Next, let us consider the term 

\V2 , ,1/2 

Let J be the set of the N — 2 s smallest coordinates of V-. Observe that for 
every / C {1, ..., N}, \I\ > 2 s one has 

/ \ 1/2 

( !>-)*) <«l*(2 s/2 ||/i-||^ 2 + \\h-\\^y/\T\log(eN/\I\Y) 



<2 Kl t\\h-\\^y/\I\\og(eN/\I\), 

since m{hJ) < 2 s . 

Thus, for every i > 2 s , 

(u-)J< 2Kit\\h- H^ log(eiV/»), 

and in particular, t>_ is dominated by (2Kit||/i_||^ 1 log(eA^/i))j>2s on the set 
J and 

UPjW-lloo < 2K l t\\h_\\ i , 1 log{eN/2 s ). 

To obtain a similar control over the vector v+, let mo be the smallest integer 
such that 72(^,^2) < d^ !l y/rn\og{eN / m) , and if the smallest one is larger 
than TV, set ?no = iV. Just as we did for V-, if / G -F and PQ)^ G A*, and if 
we set (tti)^-! = (/(-^j))i=i> then for every i > mo, w? < 2tK±d 1 p 1 log(eN/i). 
Therefore, if /+ is the set of the rriQ largest coordinates of (v+)^L 1 , then 
(ui)^Li is dominated by (2tKid^ 1 log(eN/i))i >mo on I£. Therefore, 

\ V2 / N \ 1/2 

Z ( v -^)* <8K?t%JMik ^log 4 (eiV/i) 

;Jn/i / \i=i / 



,ie 



<c 5 t 2 ^ 1 ||/i_||^ 1 \/iV : , 



implying that 
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1/2 / \ 1/2 



(e^vD*) 1/2 < ( E (*M)* ) + ( E (<**)* ) 

\i>2 s J \i£jnl + J \i£jnl^ J 



mo 



1/2 



< (E^+)n wpjv-Woo+c^wh-w^d^VN 

<c 6 t 2 {^ 2 (F,^ 2 )\\h.\\ i , 1 log(eN/2 s ) + d^\\h_\\^VN^ . 
Thus, ifm(/t_) < 2 s then 

^K)f+«2-W(^); 4 

t=l \*>2 S 

<c 7 t 2 2 s / 2 f d v , l72 (F, V> 2 ) log 2 (eiV/2 s ) 

+n ( 72 (F,'0 2 )||/ i _||^ 1 log(e^/2 s ) +^ 1 ||/i_|| v , 1 \/iVn 

^cfctrt 2 ^ ( 72 (F,V> 2 )2 s / 2 log 2 ( e iV/2 s ) + 2 s / 2 ||/ l _||^ 1 v/Jv) , (5.7) 



provided that u > 1. 

Next, we turn to the case when m(h-) > 2 s . Let J_ be the set of the 
m(h-) largest coordinates of V- and again, I + is the set of the m® largest 
coordinates of v+. Therefore, if I C {1, ..., N} and |7| > m(/t_) then 

(E( u -)*) ^ 2Kit||fc_||^v1jflog(eJ\r/|/|), 

and thus, for i > m(h-), 

(v-)t <2K l t\\h-\\ fl log(eN/i). 
Also, from the definition of m{h_) it is evident that 

m(h-) \ l / 2 

X>-)?) <4« 1 t2 s / 2 ||/ i _||^ 2 , 
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implying that 

\\Plc_V-Woo < 4K 1 i(27m(/ 1 _)) 1 / 2 ||/ 1 _||^ 2 < cgtWh-Wfr, 

because 2 s < m(h-). 

Set I = J_ U 1+ = (I+\I-) U I_. Note that 



1/2 



1/2 



Ei(«-«+)ii< E(»-)< E (*$-)* 

iei- ye/- / ye/- , 

< ci t 2 2 s / 2 ||/i_||^ (72(^,^2) + ^ 1 \ / iv) , 



where we have used that 



1/2 



.iei- 



+Ji 



<2sup E/ 2 M 



Moreover, applying the bound on ||Pjc t;_ 



1/2 



1/2 



(E(^)«) < pwu (ew*) 

<ciit 2 ||/i_||^ 2 (72(^2) + d v , 1 v / iv). 
That leaves us with the coordinates that are outside I, that is, outside both 

I re I 

J_ and i+. Observe that t> _ is dominated on I c by (2ki£||/i_ ||^, 1 log(eiV/z))^_j 
and v + is dominated on I c by {2K\td x j Jl \og(eN/i)) i=1 . Hence, 

/ \ 1/2 / N \ V2 



.2 „,2 



E^ i7j +^ 



< 4^^11/1-11^^ (E lo g 4 ( eiV A)) 



\ie/ c / 

< cu^ll/t-ll^d^-v/iV. 
Therefore, if m(/t_) > 2 s , then with (ej)^ x -probability at least 1— 2exp(— c\2U 2 2 s ) 

£%*(«-«+)< < c n ut 2 2 s ' 2 { ||Mk (72(^^2)+^!^) + ll^-lk^iV^Vj 
< ci 4 nt 2 2 s / 2 ||/i_||. fc (72(^2) +d0 1 vQv) , (5.8) 
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provided that u > 1. 

Combining ()5.7[) and (|5,8p , and since there are at most 2 2 links at the 
s-level, it is evident that for every t,u> C15 and every s < r^r, 



iv 



^ £i (( 7 r s (/)) 2 (X i )-(7r s _ 1 (/)) 2 (X i )) 



2ns\ 



> P(s,f)u 



Pr £ \3feF: 

<2exp(-c 16 u z 2 s ), 
where 

p( S ,/) ~^ 2 ^ (72(^,^2)2^ log 2 (eiV/2 s ) + 2 s / 2 ||A s (/)||^v^v) 
+ ut 2 (2 S / 2 ||A S (/)||^ 2 ) faFM + dfrVN 

It remains to show that for every / € F 



fit 



(5.9) 



{s:2 s <N} 

which is straightforward because for an almost optimal admissible sequence, 



Y J ^ s/2 \\^s(f)\U, 1 <2 72 (F^ 2 ) and £ 2 s / 2 log 2 (eiV/2 s ) ~ v 7 ^. 

{s:s<t n } 



8>1 



We need an additional preliminary result which allows one to move freely 
between the empirical process and the Bernoulli one - the Gine-Zinn sym- 
metrization Theorem [17j : 

Theorem 5.4 Let H be a class of functions and set a 2 = sup ftG # K(h — 
Eh) 2 . For every integer N and any t > 2 ' aN ' , 



Pr sup 

\h&H 



N 



^(h{Xi)-Eh) 



i=l 



> t < APr sup 
/ \h£H 



N 



ei/iC^i] 



> i/4 . 



Combining Theorem 15.41 and Theorem 15.21 we obtain the next result on 
the "beginning" of every chain. 

Theorem 5.5 There exist absolute constants ci,c 2 and C3 for which the 
following holds. Let F be a class of mean-zero functions and let (F s ) s >q be 
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an almost optimal admissible sequence with respect to the ip 2 norm. Then, 
for every fo G F and x > c\, with probability at least 1 — 2exp(— C2X 2 ' 5 ), 



sup 
feF 



N 



X>r„/) 2 pQ) - f 2 {X,) - E((7r TN f) 2 - / 2 (JQ)) 
< c 3 x (VNd^ 2 (F,i; 2 )+^(F,ij 2 fj . 



Remark 5.6 The power of x 2 ' 5 in the exponent is likely to be an artifact of 
the proof We made no effort to optimize this power since it is not of major 
importance in the problems we wish to address, and because any exponential 
tail estimate would give us the integrability properties we need. 

Proof. Fix /o G F and let H = {(vr Tjv /) 2 - f 2 : / G F}. It is standard 
to verify that a 2 = sup h€H K(h — E/i) 2 < c§dS . Since F is symmetric then 

72(^,^2) > d 1 p 1 and xy/Nd 1 p l 72(^,^2) > 2y/~Na provided that x > 2cq. 

If we set p = (■v/]Vd^ 1 j 2 (F,tp 2 ) + 72(^)^2)) then by Theorem I5.4I and 
the definition of H, for every x > 2cq, 

/ N \ 

j> tjv /) 2 pq) - f 2 {x % ) - n^rj? - /o 2 ) 



Pr sup 
\S&F 



i=l 



> Xp 



<4Pr I sup 
\f&F 



N 



^((vr^/) 2 ^)-/ 2 ^)) 



i=l 



> xp/A 



=4E x Pr e sup 



N 



J>((vr TJV /) 2 PQ)-/o 2 (^)) 



i=\ 



> xp/4 



by Fubini's Theorem. Using the notation of (|5.4p and (|5.5p . for i > ci, let 
O t = A t fl (n s <T JV -^t ) an d observe that 

Pr(n c t ) < Pr(A c t ) + ^Pr(B c t ) < 2 exp(- c 2 t log N). 



s=l 



Thus, if we set ut = x/t 2 , then as long as ut > C4 (or in other words, for 
every t such that x > c^t 2 ), Theorem I5.2I implies that 



ExPf £ SUp 
\feF 



N 



Y,e i ((7r TN f) 2 (X i )-f 2 (X i )) 



i=i 



<2 (exp(-c 5 x 2 /£ 4 ) + exp(-c 5 t)) < 2exp(-c 6 x 2/5 ), 
where the last inequality holds if we take t = x 2 ' 5 > 1. 
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The last component in the proof of Theorem A is an estimate on the 
"end" of each chain, that is, / — 7r Tjv / = J2 S > T ^ s (/). Its proof is a 
combination of Bernstein's inequality and a chaining argument (see Lemma 
1.5 in [2l])j and the key point is the observation that for every /, g and 
every u > 1, with probability at least 1 - 2exp(— cNu 2 ), (P/v(/ — g) 2 ) 1 ^ 2 < 
u \\f ~ 3lli/>2- m particular one has 

Lemma 5.7 \2J$ There exist absolute constants c\, ci, 0-3 and 04 for which 
the following holds. Let (F s ) s >q be an almost optimal admissible sequence 
of F with respect to the tp2 norm. Then, for every u > c\, with probability 
at least 1 — 2exp(— C2NU 2 ), for every f G F, 



sup{P N (f-Tr TN (f)f) 
feF 



2 < C ^A F ^ 2 



N 



and 



Esup(P N (f-Tr s (f)) 2 ) 
feF 



2 ^l/2 72(^,^2) 

S C4- 



n 



Finally, let us reformulate Theorem A. 



Theorem 5.8 There exist absolute constants c\, C2, C3 and C4 for which the 
following holds. If F is a symmetric class of mean-zero functions, then for 
every x > c\, with probability at least 1 

1 N 



sup 
feF 



N 



i=l 



2exp(-c 2 x 2 ' /5 ), 



72(^2) , 7 2 2 (^^2) 



N 



N 



In particular, 



Esup 
feF 



a7E/ 2 ^) 



Ef 



i=l 



< C4 d^ 



72(^,^2) , 7 2 2 (^^2) 



N 



+ 



N 
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Proof. Let (F s ) s >q and tn be as above. Then, for every / € F, 

N N 

£(/ 2 pQ) - E/ 2 ) = ^(/ 2 (X 4 ) - (ir TN f) 2 (X t )) 

i=l i=l 

N 
+ E (Kv/) 2 PQ - E(vr TjV /) 2 ) + iVE^/) 2 - / 2 ) 

/TV \ X / 2 / N \ 1 / 2 

< 2 (E(/-^/) 2 (^)j sup (E/ 2 (^)j 

+ 2ATsup(E(/ - vr^/) 2 ) 1 / 2 • suptE/ 2 ) 1 / 2 

/GF feF 

N 

+ supE(K JV /) 2 (X i )-E(vr Tjv /) 2 ). 

By Lemma 15.71 combined with Theorem 13. 4| with probability at least 1 — 
2exp(— c\Nt) — 2exp(— t 1 ' 2 logiV) the first and second terms are at most 

c 2 t 72 (F, fa) (72 (-F 1 , ih) + ^! v 7 ^) 

for t > C3. The third term may be bounded using Theorem 15.51 Indeed, for 
every such t, 

N 

E (((W) 2 (*i) - / 2 (^)) - lE(vr rjv /) 2 - / 2 ) 



< 04*72(^,^2) 72(^,^2) +d^VN 



with probability at least 1 — 2exp(— est 2 ' 5 ). 

Finally, a similar argument to the one used in the proof of Theorem 
shows that for every such t, with probability at least 1 — 2exp(— c^t 2 ' 5 ) 

N 



J2(fo(Xi) - E/ 2 ) < c 6 t(4 + d^d^VN) 

i=l 

< c 6 t( 72 2 (F, fo) +d,p ll2 (F,ij 2 )VN). 
Hence, with probability at least 1 — 4(exp(— c\t 1 ' 2 logiV) — exp(— c^t 2 ' 5 )), 



sup 
feF 



N 



«=i 



scmV^V 1 '^ 



N 



N 
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as required. 

The claim regarding the expectation follows from an integration argu- 
ment and is omitted. ■ 



6 Applications 

In this final section we will present several geometric applications of our 
three main results, though as pointed out in the introduction, there are 
numerous other applications in Empirical Processes Theory, Nonparametric 
Statistics and Asymptotic Geometric Analysis that will not be mentioned 
here. 

It is well known that many results in Asymptotic Geometric Analysis are 
based on a random selection argument, for example, a random choice of a 
section or of a projection of a convex body in W 1 . Historically, the motivation 
was to understand the geometry of convex bodies and thus the models of 
random selection that had been studied were rather limited. Indeed, in 
classical results such as Dvoretzky's Theorem, low-M* estimates and many 
others (see, e.g. [27], [31]), the selection was preformed using a random point 
on a Grassman manifold G n: k relative to the Haar measure, or by applying 
a gaussian operator ^2 i= i(Gi, -)ej to the given body, with {Gi)\ =l selected 
independently according to the canonical gaussian measure on W 1 . 

In recent years, the distribution of volume in a convex body has become 
a central area of interest in Asymptotic Geometric Analysis. Hence, it is 
natural to ask whether the classical results in the area can be extended 
to other random selection methods, endowed by these volume measures, or, 
more generally, by isotropic, log-concave measures. It is, perhaps, surprising 
that extending the classical gaussian-based results even to natural subgaus- 
sian selection methods, for example, the uniform measure on {—1, 1}", is not 
simple at all, and in some cases the extension is simply not true. Moreover, 
going beyond the subgaussian realm and proving such results for arbitrary 
isotropic, log-concave measures is even more difficult, mainly because the 
tail estimate that one has for linear functionals is rather weak. Indeed, in 
the isotropic, log-concave case the ipi and the £% norms are equivalent, but 
\\(x, -)||^ 2 might have a strong dependence on the dimension. 

Here, we will study the way a random operator T = ^2 i= i(Xi, -)ej acts 
on a convex body, where (Xi)f =l are selected according to an isotropic, 
log-concave measure on R n . We will show that many parts of the gaussian 
theory remain true for such an operator, with the main difference being that 
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the classical parameter 

„ n 

\/nM*(K) = ^/n \\x\\K°da ~ E sup > 9i%i 

that is used to quantify the phenomena one sees for a gaussian operator 
is replaced by 72 (if, ^2) (and recall that (K,ip2) is the set of functions 
{7a;, -\ : x G -K"} endowed with the ^2(1^) norm). Another difference is that 
the probabilistic estimates we will obtain for a general random, isotropic, 
log-concave operator are much weaker than in the gaussian or subgaussian 
cases. 

Assume that K C W 1 is symmetric. Then d 1 p a ~ diam.(K,ip a ) and 
dp, ~ diam(iC, £2)- For a = 1,2 and an isotropic measure //, let Q a (fJ,) = 
sup0 6i gn-i ||\^, ■/live ~ the equivalence constant between the -0O norm re- 
stricted to linear functionals on M n and the P^ norm. For example, if /1 is an 
isotropic, log-concave measure on M. n then by BorelPs inequality, Qi(/J,) ~ 1. 
On the other hand, Q2{^) can grow polynomially in n. 

6.1 The norm of random matrices 

Let K C W 1 be a convex body and let F : M. n — > R be the random opera- 
tor ^2 i= i(Xi,-)ei, where {Xi)f =l are independent, selected according to an 
isotropic, log-concave measure on W 1 . Our goal is to estimate EUT^^at, 
and for the sake of brevity we will consider the case p > 2, although the 
case 1 < p < 2 can be handled using similar means. 

Let us begin with the relatively simple subgaussian case, when Q2{n) ~ 
1. 

Theorem 6.1 There exists an absolute constant c for which the following 
holds. If p > 2 and K C W 1 is a convex body, then for every integer N , 

ElirHtf-^ < c (72(^,^2) + Q 2 (/i)diam(^,^) • N 1 ^ 



p 



Since the proof of Theorem 16. II is rather standard, we will only sketch it 
here. 

Proof. Let p' be the conjugate index of p. Consider the random process 
indexed by K x BE , defined by Z XjV = J2i=i(Xi,x)yi and note that for 
every (x,y) and (x ,y'), 

\\Zx,y ~ Zx',y>\\ip 2 < dfo\\y - y'\\2 + diam(B$ ,£%)\\(X,x - x')\\^ 2 . 
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Therefore, applying a chaining argument, 

E sup Z^ y < Cl {d^ l2 {B^^)+dmm{B^^) l2 {K^ 2 )). 

x€K, yeB" 

To complete the proof, if G = (gi, ...,gj\r) is the standard gaussian vector in 
K^ then by the Majorizing Measures Theorem, 

k 
72(£y ,4") < c 2 E sup J^gm = c 2 E\\G\\ iN < c^N 1 ^. 

Also, since p > 2 then diam(£y ,i 2 ) = 1 and clearly d^ 2 < Q 2 {^)d^. 
Therefore, 

n\n K ^ < 12(K^ 2 ) + Q 2 {p)dqN 1 /P, 

as claimed. ■ 

It is simple to verify that Theorem 16.11 cannot be improved, up to the 
constants involved. Indeed, if p is the standard gaussian measure on M n then 
Q 2 ((i) is an absolute constant and 72(^,^2) ~ 12{K,L 2 ) = ^{K,^)- Let 
{Gi)jL 1 be independent copies distributed according to \i and since e\ £ BS 
then 

TV n 

E||r||^^jv = Esup sup y^ (Gi, x)yi >E sup y^QiXi. 
xeKyeB"^ ^K^i 

Also, if 1 1 a?o 1 1 2 = dpi then 

M^Wk^ > E||ra? ||/N > C2||x || 2 A rl/p , 

showing that the estimate in Theorem 16.11 is sharp in this case. 

Thanks to Theorem B it is possible to replace Q 2 (n) in Theorem 16. II by 
Qi(n), which, in the log-concave case, is of the order of an absolute constant. 



Theorem 6.2 There exists an absolute constant c for which the following 
holds. Let K be a convex body in W 1 . Then for every p > 2 and any integer 
N, a random isotropic, log-concave operator T satisfies that 



E||r||jr_><* < c h 2 (K, i> 2 ) + diam(tf, Q) ■ N 1 ^) . 

Proof. Since diam(i ? , 2JJ1) ~ diam(if, £ 2 ), the claim follows immediately 
from Theorem B anc 
F = {(x,-) :xeK}. 
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from Theorem B and its extensions to other £ p norms for m = N and 



An interesting case in which Theorem 16.21 can be used is the "standard 
shrinking" phenomenon. Simply put, standard shrinking is the observation 
that for every x G K n , and with high probability with respect to the uniform 
measure on the Grassman manifold G n j~, the random orthogonal projection 
Pe satisfies that H-Pg^lh < c-y/fc/n||a;||2. This property can be extended 
to a more general situation. Indeed, one can show that if K C W 1 is a 
convex body, k* = \fnM* (K) j 'da and k > c\k* , then with high probability 
in G U: k, diam^PE-K", £% ) < c 2 d^n^Jk/n. Moreover, this result is sharp, since 
Milman's version of Dvoretzky's Theorem (see, for example, [27]) implies 
that if k < c 3 /c*, then with high probability PeK D c±M*{K)B\, and the 
diameter can not decrease further. 

The shrinking of the diameter for k > k* extends to other random oper- 
ators, but even in a relatively simple case, when T is selected according to 
the uniform measure on {—1, l} n , some nontrivial machinery is required [3], 
particularly if one wishes to recover the probabilistic estimate ~ exp(— ck). 
The methods developed in |24] (see Corollary 1.9 there) show that the same 
is true - and with the same probability estimate, as long as Q 2 {^) ~ 1- 

Theorem 16.21 implies that shrinking does happen for a random isotropic, 
log-concave operator — though with a weaker probabilistic estimate. Indeed, 
consider the operator A = T/^/n, let K C W 1 be a convex body and set 
k' = "/2(K,ip2)/d#n. Then, with high probability, 



diam(AiiC,4) < 1= (72(^,^2) +&am{K,%)y/k) < ^diam(K, 



%), 



as long as k > k'. Since 

dV^M*(K) < j 2 (K,<tp 2 ) < c 2 Q,2(v)VnM*(K), 

it follows that if \i happens to be subgaussian, i.e. if Q 2 {^) ~ 1, then k' and 
k* are equivalent. 

6.2 Low-M* estimates 

Given a convex body K C W 1 and k < n, one would like to find a subspace 
E C W 1 for which the Euclidean diameter of K n E is as small as possible. 
We refer the reader to [261 121] for a brief description of the progress made 
on this problem. 

In [281 [29] it was shown that if E is the kernel of a random orthogonal 
projection (or of a gaussian projection), and if 

r* N = inf |r > : <JnM*{K D rS n ~ l )/y/N < cr\ , (6.1) 
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then diani(£' n K) < r^, where c is an absolute constant. 

Since the original proof of this result is based on the structure of gaus- 
sian variables or that of the Haar measure on G n k, extending it to other 
natural random operators is not trivial. Equation (|6.1|) was extended to 
the subgaussian case in [23] using a subgaussian version of Theorem A. It 
was shown that if fj, is isotropic and T = Yli=i\-^ii'/ e i (with X±, ...,X^, 
independent, distributed according to //), then with high probability, 

diam(i^ n kerr) < inf |r > : Q 2 (fj,)^ 2 (K n rS n ~\ ip 2 )/ ^N < cr\ . (6.2) 

Therefore, if fi is isotropic and Q 2 (p) ~ 1, (that is, if T is an isotropic, 
subgaussian operator) then (|6.1|) is true. Applying Theorem A, the fact 
that for an isotropic, log-concave measure Qi(/ i ) ~ 1 anci the proof from 
J, one has 



Theorem 6.3 There exist absolute constants c and c\ for which the follow- 
ing holds. Let T : P^ — > i^ be a random isotropic log-concave operator. Then 
for a convex body K CW 1 one has 

E (diam(K n kerr)) < c\ inf |r > : j 2 (K n rS 71 ' 1 , ifa)/yfN < cr\ , 

and a similar estimate holds with high probability. 

Again, Theorem 16.31 extends the classical result to any isotropic log- 
concave case ensemble, with ^ 2 (K,^p 2 ) taking the place of ^/nM*(K) - 
though with a weaker probabilistic estimate. 



n-l 



6.3 The process indexed by S 

This section is devoted to a problem that is far from being fully solved - the 
behavior of the process 



sup 



wE<^.«> 2 -' 



(6.3) 



1 = 1 

where X\, ...,Xn are selected independently according to an isotropic, log- 
concave measure on M n . 

In pQ the authors solved the following facet of this problem: Given e > 
and < 5 < 1, how many random points X\,...,Xn are needed to ensure 
that with probability 1 — 5, 



sup 



i=l 
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<£? 



An equivalent formulation of this question is to find the smallest N that 
would still guarantee that a random, isotropic, log-concave operator T em- 
beds £2 in £2 1 + e isomorphically. 

This problem has been studied extensively in recent years (e.g. [211 El 
[T5| I52"| |2"U| \3U\ |2"5| 0]), in which the estimate has been improved from the 
initial N > c(e, <5)n 2 in |21] to the best possible estimate of N > c(e, <5)n, 
proved in [T]. In fact, what was actually proved in pQ is the following: 

Theorem 6.4 There exist absolute constants C , c and c\ for which the 
following holds. Let \i be an isotropic, log-concave measure on W 1 and let 
{Xi)f =l be independent, distributed according to fi. Then, for every t > 1 
and every 1 < N < exp(y / re), with probability at least 1 — 2exp(— ct^Jn), for 
every I C {1, ..., N}, 

( V /2 

sup V(#,^) 2 <c(v^+^Tlog(eiV/|/|)). (6.4) 

0es»-i \ ieI J v ' 

Moreover, for every c\n < N < exp(y / n) and every s,t > 2, 

N 



sup 



1 J2& X t f - 1 < C (ts^\og{eN/n) + s^J (6.5) 



with probability at least 1 — 2exp(— csy/n) — 2exp(— cmin{u, v}), where u = 
t 2 s 2 n log 2 (eN/n) and v = (t/ s)\/nN / log(eN/n) . 

Although Theorem 16.41 beautifully resolves the case iV ~ n, its proof has 
certain weaknesses from the point of view of empirical processes theory and 
the general understanding of the process (|6.3p . First of all, (|6.5p is derived 
from (|6.4p using a decomposition and contraction argument, just like our 
Theorem C is derived from Theorem B. Hence, there is an intrinsic logarith- 
mic looseness in fl6.5f) - a superfluous factor of log N for N > c((3)n 1+ " for 
any /3 > 0. 

Second, the proof of Theorem 16.41 relies on the Euclidean nature of the 
problem in a very strong way: that the given class is a class of linear function- 
als on R n , that the indexing set is the entire sphere and that the measure is 
isotropic, log-concave (in particular, that Qi(/i) ~ 1 and that the Euclidean 
norm of a random point concentrates around y/n). Hence, the method of [T] 
cannot be extended beyond this limited setup, even to obtain an analogous 
result for a small subset of the sphere as an indexing class. Naturally, it is 
also impossible to obtain an "empirical processes" result like Theorem A in 
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this way. A consequence of this limitation is that the method of pQ cannot 
be used to prove the applications presented in the two previous sections (i.e., 
estimates on the norm ||r||^-_^iv, the shrinking phenomenon, and low-M* 
estimates) since those applications require accurate information on the way 
r acts on arbitrary subsets of M n rather than on the entire sphere. 

Process (16.3|) is very far from being understood when one goes beyond 
the case N ~ n. A reasonable conjecture is that for any N > n, with high 
probability/in expectation, 



sup 



^|W-i ^s 



(6.6) 



which is the situation for the gaussian ensemble. 

Below, we will indicate some of the problems one faces when trying to 
verify this conjecture, with the main one being that very little is known on 
the metric structure endowed on S" 1-1 by a log-concave measure. 

Currently, the best estimate on (J6.3J) in the range c(/3)n 1+ ^ < N < 
exp(-y/ra) for any /? > is c\ (f3) \/{n log n) /N . This is a corollary of Theorem 
A, and the suboptimal estimate from [23], that 72(£' n_1 , V>2) ^ \pn log n for 
\x that is supported in C2\/nB2 (the so-called small diameter case). Note 
that the small diameter assumption can be made without loss of generality 
as long as JV < exp(-y/n) thanks to the result of Paouris [30] which states 
that for N < exp(y / n), Emaxj<7v ||Aj||2 < y/n. Hence, for those values 
of N, one may assume that \i is supported in ci^fnBV^, implying that if 
c(/3)n 1+ ^ < N < exp(- v /n) then Theorem A improves Theorem 16.41 and gives 
the best known estimate on (|6.3p . 

We believe that under the small diameter assumption, the extra loga- 
rithmic term in 72 (S" 1-1 , ^2) could be removed. Indeed, if fi is supported 
on a ball of radius ~ ^/n, "most" directions 6 € S n ~ 1 have a ip2 norm that 
is bounded by an absolute constant (see, for example, [K]). Unfortunately, 
even under a small diameter assumption, there is very little information on 
the geometry of the set of these "good" directions, except that it is a very 
large subset of the sphere. 

The second step towards a complete solution, and most likely the more 
difficult one, is when iV > exp(y / n). Here, one can no longer assume that 
/i is supported in a ball of radius ~ ^/n, and thus both Theorem 16.41 and 
the bound on 72 (S'™ -1 , ^2) from [25j fail. Moreover, when leaving the small 
diameter case, it is not known whether there is even a single direction 6 for 
which \\0\\jp 2 ~ 1. 
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6.3.1 The unconditional case 



We end this note with an example of how the yfiogn factor may be removed 
in a special case, when \x is unconditional. This example illustrates the 
difficulties that one is likely to encounter in the general case, where there is 
little structure at our disposal. 

The argument has two parts. First, we will show that one may consider a 
slightly different "small diameter" assumption, and second, that under this 
assumption, the metric entropy log N(S n ,eB^ 2 ) is well behaved. 

For the first part, note that by the Bobkov-Nazarov Theorem [71(16], if 
N ~ n a and if we denote the j — th coordinate of a monotone rearrangement 
of the coordinates of the vector Xi by {Xi)*, then with high probability, for 
every 1 < i < N and 1 < j • < n, (Xi)*, < c a log(en/j). Hence, without loss 
of generality we may assume that \i is supported in c±(a)B^n. This gives 
more accurate information than the standard small diameter assumption, 
that [i is supported in c^/nB^- In particular, we may assume that almost 
surely, for every j < n, (X)*- < c Q log(en/j). Since // is unconditional, then 
for every 9 6 S"™ -1 the random variable (X, 8) has the same distribution 
as X^/=i e il(^' e i)l^i' where (ej)™ =1 are i.i.d. Bernoulli random variables. 
Hence, for any p > 1, 



(E x \(x,e)\p) 



i/p 



E 



Xxe 



E^K^)^ 



i/p 



<cVp e a - 



EK^'>I 2 * 



2 Q 2 

J 



i=i 



P/2\ 



1/P 



<<h/p\E 



x 



YM 2 ^^ 



P/2 N 



1/p 



1/2 



< Cl (a)VP l^2(8 2 )pog\en/j) 



In particular, for every 6 G S n , 

II^IU 2 <c(a)(E(0}log 2 (en/j) 



1/2 



J =1 



and di&m(S n ~ 1 , 1P2) < c(a)logra. 

Now, just as in |25| one may show that for every e < 2, the covering 
numbers satisfy N(S n ~ 1 ,eB 1 p 2 ) < (c2/s) n . Thus, it remain to estimate the 
covering numbers for larger scales. 
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To that end, we will use a minor modification of the sets Ng and B m 
that appeared in Section [3j 

Let A e = iz G B% : |supp(«)| < £, ||z||oo < 1/v^j, fix r such that 2 r < 

n/10 and let e r = log(en/2 r ). Set N 2 j C A 2J - to be an e r (2 3 /n)-covei of A 2 j 
with respect to the £ 2 norm and define 



B r ==\z€B$: |supp(z)| < 2 r , supp(z) = \J I 3 , P IjZ G N. 



r-l 

A? 2J ? ) 



where Ij are disjoint sets of coordinates with |/q| = 2 and \Ij\ = 2 3 for j > 1. 
It is standard to verify that |P r | < exp(co2 r log(era/2 r )) and that for 
every 9 G S"™" 1 there is some 6 £ B r whose support is denoted by /, such 
that 



r-l 



|0 - 0\U, 2 < Cl ]T ||P/, ((9 - 0)|| 2 log(en/2>') + ||P/^|| 2 log(en/2 r 



r-l 



<c x [ -^^2^1og(en/2J)+log(en/2 r ) ] < c 2 e r 



for ci and c 2 that depend on a. 

Therefore, B r is a C2£ r -cover of <S n_1 with respect to the ip 2 norm, im- 
plying that 

log N{S n -\c 2 £ r B 4 , 2 ) < c 2 r log(en/2 r ). 

It is well known |36j that if (T, d) is a metric space then 

rdiam(T,d) 

!2{T,d)< y/log(N(T,e,d))de, 

Jo 

and thus a simple calculation of this entropy integral shows that 

l2(S n -\i; 2 )<c 3 (a)^, 
proving our claim. ■ 
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